Array for detecting microbes

ABSTRACT

The present embodiments relate to an array system for detecting and identifying biomolecules and organisms. More specifically, the present embodiments relate to an array system comprising a microarray configured to simultaneously detect a plurality of organisms in a sample at a high confidence level.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of, and claims the benefit ofpriority to, PCT Application No. PCT/US2007/024720, filed Nov. 29, 2007,which was written in English, published in English as WO/2008/130394 anddesignated the United States of America, which claims priority under 35U.S.C. §119(e) to U.S. Provisional Application No. 60/861,834 filed Nov.30, 2006, both of which are hereby incorporated by reference in theirentirety.

STATEMENT REGARDING FEDERALLY SPONSORED R&D

This invention was made with Government support under Grant No.DE-AC03-76SF00098 from the Department of Homeland Security and ContractNo. DE-AC0-05CH11231 from the Department of Energy.

REFERENCE TO SEQUENCE LISTING

The present application is being filed along with a Sequence Listing inelectronic format. The Sequence Listing is provided as a file entitledLBNL026C1.TXT, created May 28, 2009, which is 5.51 KB in size. Theinformation in the electronic format of the Sequence Listing isincorporated herein by reference in its entirety.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present embodiments relate to an array system for detecting andidentifying biomolecules and organisms. More specifically, the presentembodiments relate to an array system comprising a microarray configuredto simultaneously detect a plurality of organisms in a sample at a highconfidence level.

2. Description of the Related Art

In the fields of molecular biology and biochemistry, biopolymers such asnucleic acids and proteins from organisms are identified and/orfractionated in order to search for useful genes, diagnose diseases oridentify organisms. A hybridization reaction is frequently used as apretreatment for such process, where a target molecule in a sample ishybridized with a nucleic acid or a protein having a known sequence. Forthis purpose, microarrays, or DNA chips, are used on which probes suchas DNAs, RNAs or proteins with known sequences are immobilized atpredetermined positions.

A DNA microarray (also commonly known as gene or genome chip, DNA chip,or gene array) is a collection of microscopic DNA spots attached to asolid surface, such as glass, plastic or silicon chip forming an array.The affixed DNA segments are known as probes (although some sources willuse different nomenclature), thousands of which can be used in a singleDNA microarray. Measuring gene expression using microarrays is relevantto many areas of biology and medicine, such as studying treatments,disease, and developmental stages. For example, microarrays can be usedto identify disease genes by comparing gene expression in diseased andnormal cells.

Molecular approaches designed to describe organism diversity routinelyrely upon classifying heterogeneous nucleic acids amplified by universal16S RNA gene PCR (polymerase chain reaction). The resulting mixedamplicons can be quickly, but coarsely, typed into anonymous groupsusing T-/RFLP (Terminal Restriction Fragment Length Polymorphism), SSCP(single-strand conformation polymorphism) or T/DGGE(temperature/denaturing gradient gel. electrophoresis). These groups maybe classified through sequencing, but this requires additional labor tophysically isolate each 16S RNA type, does not scale well for largecomparative studies such as environmental monitoring, and is onlysuitable for low complexity environments. Also, the number of clonesthat would be required to adequately catalogue the majority of taxa in asample is too large to be efficiently or economically handled. As such,an improved array and method is needed to efficiently analyze aplurality of organisms without the disadvantages of the abovetechnologies.

SUMMARY OF THE INVENTION

Some embodiments relate to an array system including a microarrayconfigured to simultaneously detect a plurality of organisms in asample, wherein the microarray comprises fragments of 16s RNA unique toeach organism and variants of said fragments comprising at least 1nucleotide mismatch, wherein the level of confidence of species-specificdetection derived from fragment matches is about 90% or higher.

In one aspect, the plurality of organisms comprise bacteria or archaea.

In another aspect, the fragments of 16s RNA are clustered and alignedinto groups of similar sequence such that detection of an organism basedon at least 1 fragment matches is possible.

In yet another aspect, the level of confidence of species-specificdetection derived from fragment matches is about 95% or higher.

In still another aspect, the level of confidence of species-specificdetection derived from fragment matches is about 98% or higher.

In some embodiments, the majority of fragments of 16s RNA unique to eachorganism have a corresponding variant fragment comprising at least 1nucleotide mismatch.

In some aspects, every fragment of 16s RNA unique to each organism has acorresponding variant fragment comprising at least 1 nucleotidemismatch.

In other aspects, the fragments are about 25 nucleotides long.

In some aspects, the sample is an environmental sample.

In other aspects, the environmental sample comprises at least one ofsoil, water or atmosphere.

In yet other aspects, the sample is a clinical sample.

In still other aspects, the clinical sample comprises at least one oftissue, skin, bodily fluid or blood.

Some embodiments relate to a method of detecting an organism includingapplying a sample comprising a plurality of organisms to the arraysystem which includes a microarray that comprises fragments of 16s RNAunique to each organism and variants of said fragments comprising atleast 1 nucleotide mismatch, wherein the level of confidence ofspecies-specific detection derived from fragment matches is about 90% orhigher; and identifying organisms in the sample.

In some aspects, the plurality of organisms comprise bacteria orarchaea.

In other aspects, the majority of fragments of 16s RNA unique to eachorganism have a corresponding variant fragment comprising at least 1nucleotide mismatch.

In still other aspects, every fragment of 16s RNA unique to eachorganism has a corresponding variant fragment comprising at least 1nucleotide mismatch.

In yet other aspects, the fragments are about 25 nucleotides long.

In some aspects, the organism to be detected is the most metabolicallyactive organism in the sample.

Some embodiments relate to a method of fabricating an array systemincluding identifying 16s RNA sequences corresponding to a plurality oforganisms of interest; selecting fragments of 16s RNA unique to eachorganism; creating variant RNA fragments corresponding to the fragmentsof 16s RNA unique to each organism which comprise at least 1 nucleotidemismatch; and fabricating said array system.

In some aspects, the plurality of organisms comprise bacteria orarchaea.

In other aspects, the majority of fragments of 16s RNA unique to eachorganism have a corresponding variant fragment comprising at least 1nucleotide mismatch.

In still other aspects, every fragment of 16s RNA unique to eachorganism has a corresponding variant fragment comprising at least 1nucleotide mismatch.

In yet other aspects, the fragments are about 25 nucleotides long.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a bar graph showing rank-abundance curve of phylotypes withinthe urban aerosol clone library obtained from San Antonio calendar week29. Phylotypes were determined by clustering at 99% homology usingnearest neighbor joining.

FIG. 2 is a line graph showing that Chao1 and ACE richness estimatorsare non-asymptotic, indicating an under-estimation of predicted richnessbased on numbers of clones sequenced.

FIG. 3 is a graph showing a Latin square assessment of 16S rRNA genesequence quantitation by microarray.

FIG. 4 is a graph showing comparison of real-time PCR and arraymonitoring of Pseudomonas oleovorans density in aerosol samples from SanAntonio. Corrected Array Hybridization Score is the ln(intensity)normalized by internal spikes as described under Normalization.

DETAILED DESCRIPTION

The present embodiments are related to an array system for detecting andidentifying biomolecules and organisms. More specifically, the presentembodiments relate to an array system comprising a microarray configuredto simultaneously detect a plurality of organisms in a sample at a highconfidence level.

In some embodiments, the array system uses multiple probes forincreasing confidence of identification of a particular organism using a16S rRNA gene targeted high density microarray. The use of multipleprobes can greatly increase the confidence level of a match to aparticular organism. Also, in some embodiments, mismatch control probescorresponding to each perfect match probe can be used to furtherincrease confidence of sequence-specific hybridization of a target to aprobe. Probes with one or more mismatch can be used to indicatenon-specific binding and a possible non-match. This has the advantage ofreducing false positive results due to non-specific hybridization, whichis a significant problem with many current microarrays.

Some embodiments of the invention relate to a method of using an arrayto simultaneously identify multiple prokaryotic taxa with a relativelyhigh confidence. A taxa is an individual microbial species or group ofhighly related species that share an average of about 97% 16S rRNA genesequence identity. The array system of the current embodiments may usemultiple confirmatory probes, each with from about 1 to about 20corresponding mismatch control probes to target the most unique regionswithin a 16S rRNA gene for about 9000 taxa. Preferably, eachconfirmatory probe has from about 1 to about 10 corresponding mismatchprobes. More preferably, each confirmatory probe has from about 1 toabout 5 corresponding mismatch probes. The aforementioned about 9000taxa represent a majority of the taxa that are currently known through16S rRNA clone sequence libraries. In some embodiments, multiple targetscan be assayed through a high-density oligonucleotide array. The sum ofall target hybridizations is used to identify specific prokaryotic taxa.The result is a much more efficient and less time consuming way ofidentifying unknown organisms that in addition to providing results thatcould not previously be achieved, can also provide results in hours thatother methods would require days to achieve.

In some embodiments, the array system of the present embodiments can befabricated using 16s rRNA sequences as follows. From about 1 to about500 short probes can be designed for each taxonomic group. In someembodiments, the probes can be proteins, antibodies, tissue samples oroligonucleotide fragments. In certain examples, oligonucleotidefragments are used as probes. In some embodiments, from about 1 to about500 short oligonucleotide probes, preferably from about 2 to about 200short oligonucleotide probes, more preferably from about 5 to about 150short oligonucleotide probes, even more preferably from about 8 to about100 short oligonucleotide probes can be designed for each taxonomicgrouping, allowing for the failure of one or more probes. In oneexample, at least about 11 short oligonucleotide probes are used foreach taxonomic group. The oligonucleotide probes can each be from about5 bp to about 100 bp, preferably from about 10 bp to about 50 bp, morepreferably from about 15 bp to about 35 bp, even more preferably fromabout 20 bp to about 30 bp. In some embodiments, the probes may be5-mers, 6-mers, 7-mers, 8-mers, 9-mers, 10-mers, 11-mers, 12-mers,13-mers, 14-mers, 15-mers, 16-mers, 17-mers, 18-mers, 19-mers, 20-mers,21-mers, 22-mers, 23-mers, 24-mers, 25-mers, 26-mers, 27-mers, 28-mers,29-mers, 30-mers, 31-mers, 32-mers, 33-mers, 34-mers, 35-mers, 36-mers,37-mers, 38-mers, 39-mers, 40-mers, 41-mers, 42-mers, 43-mers, 44-mers,45-mers, 46-mers, 47-mers, 48-mers, 49-mers, 50-mers, 51-mers, 52-mers,53-mers, 54-mers, 55-mers, 56-mers, 57-mers, 58-mers, 59-mers, 60-mers,61-mers, 62-mers, 63-mers, 64-mers, 65-mers, 66-mers, 67-mers, 68-mers,69-mers, 70-mers, 71-mers, 72-mers, 73-mers, 74-mers, 75-mers, 76-mers,77-mers, 78-mers, 79-mers, 80-mers, 81-mers, 82-mers, 83-mers, 84-mers,85-mers, 86-mers, 87-mers, 88-mers, 89-mers, 90-mers, 91-mers, 92-mers,93-mers, 94-mers, 95-mers, 96-mers, 97-mers, 98-mers, 99-mers, 100-mersor combinations thereof.

Non-specific cross hybridization can be an issue when an abundant 16SrRNA gene shares sufficient sequence similarity to non-targeted probes,such that a weak but detectable signal is obtained. The use of sets ofperfect match and mismatch probes (PM-MM) effectively minimizes theinfluence of cross-hybridization. In certain embodiments, each perfectmatch probe (PM) has one corresponding mismatch probe (MM) to form apair that are useful for analyzing a particular 16S rRNA sequence. Inother embodiments, each PM has more than one corresponding MM.Additionally, different PMs can have different numbers of correspondingMM probes. In some embodiments, each PM has from about 1 to about 20 MM,preferably, each PM has from about 1 to about 10 MM and more preferably,each PM has from about 1 to about 5 MM.

Any of the nucleotide bases can be replaced in the MM probe to result ina probe having a mismatch. In one example, the central nucleotide basesequence can be replaced with any of the three non-matching bases. Inother examples, more than one nucleotide base in the MM is replaced witha non-matching base. In some examples, 10 nucleotides are replaced inthe MM, in other examples, 5 nucleotides are replaced in the MM, in yetother examples 3 nucleotides are replaced in the MM, and in still otherexamples, 2 nucleotides are replaced in the MM. This is done so that theincreased hybridization intensity signal of the PM over the one or moreMM indicates a sequence-specific positive hybridization. By requiringmultiple PM-MM probes to have a confirmation interaction, the chancethat the hybridization signal is due to a predicted target sequence issubstantially increased.

In other embodiments, the 16S rRNA gene sequences can be grouped intodistinct taxa such that a set of the short oligonucleotide probes thatare specific to the taxon can be chosen. In some examples, the 16s rRNAgene sequences grouped into distinct taxa are from about 100 bp to about1000 bp, preferably the gene sequences are from about 400 bp to about900 bp, more preferably from about 500 bp to about 800 bp. The resultingabout 9000 taxa represented on the array, each containing from about 1%to about 5% sequence divergence, preferably about 3% sequencedivergence, can represent substantially all demarcated bacterial andarchaeal orders.

In some embodiments, for a majority of the taxa represented on thearray, probes can be designed from regions of gene sequences that haveonly been identified within a given taxon. In other embodiments, sometaxa have no probe-level sequence that can be identified that is notshared with other groups of 16S rRNA gene sequences. For these taxonomicgroupings, a set of from about 1 to about 500 short oligonucleotideprobes, preferably from about 2 to about 200 short oligonucleotideprobes, more preferably from about 5 to about 150 short oligonucleotideprobes, even more preferably from about 8 to about 100 shortoligonucleotide probes can be designed to a combination of regions onthe 16S rRNA gene that taken together as a whole do not exist in anyother taxa. For the remaining taxa, a set of probes can be selected tominimize the number of putative cross-reactive taxa. For all three probeset groupings, the advantage of the hybridization approach is thatmultiple taxa can be identified simultaneously by targeting uniqueregions or combinations of sequence.

In some embodiments, oligonucleotide probes can then be selected toobtain an effective set of probes capable of correctly identifying thesample of interest. In certain embodiments, the probes are chosen basedon various taxonomic organizations useful in the identification ofparticular sets of organisms.

In some embodiments, the chosen oligonucleotide probes can then besynthesized by any available method in the art. Some examples ofsuitable methods include printing with fine-pointed pins onto glassslides, photolithography using pre-made masks, photolithography usingdynamic micromirror devices, ink-jet printing or electrochemistry. Inone example, a photolithographic method can be used to directlysynthesize the chosen oligonucleotide probes onto a surface. Suitableexamples for the surface include glass, plastic, silicon and any othersurface available in the art. In certain examples, the oligonucleotideprobes can be synthesized on a glass surface at an approximate densityof from about 1,000 probes per μm² to about 100,000 probes per μm²,preferably from about 2000 probes per μm² to about 50,000 probes perμm², more preferably from about 5000 probes per μm² to about 20,000probes per μm². In one example, the density of the probes is about10,000 probes per μm². The array can then be arranged in anyconfiguration, such as, for example, a square grid of rows and columns.Some areas of the array can be oligonucleotide 16S rDNA PM or MM probes,and others can be used for image orientation, normalization controls orother analyses. In some embodiments, materials for fabricating the arraycan be obtained from Affymetrix, GE Healthcare (Little Chalfont,Buckinghamshire, United Kingdom) or Agilent Technologies (Palo Alto,Calif.)

In some embodiments, the array system is configured to have controls.Some examples of such controls include 1) probes that target ampliconsof prokaryotic metabolic genes spiked into the 16S rDNA amplicon mix indefined quantities just prior to fragmentation and 2) probescomplimentary to a pre-labeled oligonucleotide added into thehybridization mix. The first control collectively tests thefragmentation, biotinylation, hybridization, staining and scanningefficiency of the array system. It also allows the overall fluorescentintensity to be normalized across all the arrays in an experiment. Thesecond control directly assays the hybridization, staining and scanningof the array system. However, the array system of the presentembodiments is not limited to these particular examples of possiblecontrols.

The accuracy of the array of some embodiments has been validated bycomparing the results of some arrays with 16S rRNA gene sequences fromapproximately 700 clones in each of 3 samples. A specific taxa isidentified as being present in a sample if a majority (from about 70% toabout 100%, preferably from about 80% to about 100% and more preferablyfrom about 90% to about 100%) of the probes on the array have ahybridization signal about 100 times, 200 times, 300 times, 400 times or500 times greater than that of the background and the perfect matchprobe has a significantly greater hybridization signal than its one ormore partner mismatch control probe or probes. This ensures a higherprobability of a sequence specific hybridization to the probe. In someembodiments, the use of multiple probes, each independently indicatingthat the target sequence of the taxonomic group being identified ispresent, increases the probability of a correct identification of theorganism of interest.

Biomolecules, such proteins, DNA, RNA, DNA from amplified products andnative rRNA from the 16S rRNA gene, for example can be probed by thearray of the present embodiments. In some embodiments, probes aredesigned to be antisense to the native rRNA so that directly labeledrRNA from samples can be placed directly on the array to identify amajority of the actively metabolizing organisms in a sample with no biasfrom PCR amplification. Actively metabolizing organisms havesignificantly higher numbers of ribosomes used for the production ofproteins, therefore, in some embodiments, the capacity to make proteinsat a particular point in time of a certain organism can be measured.This is not possible in systems where only the 16S rRNA gene DNA ismeasured which encodes only the potential to make proteins and is thesame whether an organism is actively metabolizing or quiescent or dead.In this way, the array system of the present embodiments can directlyidentify the metabolizing organisms within diverse communities.

In some embodiments, the array system is able to measure the microbialdiversity of complex communities without PCR amplification, andconsequently, without all of the inherent biases associated with PCRamplification. Actively metabolizing cells typically have about 20,000or more ribosomal copies within their cell for protein assembly comparedto quiescent or dead cells that have few. In some embodiments, rRNA canbe purified directly from environmental samples and processed with noamplification step, thereby avoiding any of the biases caused by thepreferential amplification of some sequences over others. Thus, in someembodiments the signal from the array system can reflect the true numberof rRNA molecules that are present in the samples, which can beexpressed as the number of cells multiplied by the number of rRNA copieswithin each cell. The number of cells in a sample can then be inferredby several different methods, such as, for example, quantitativereal-time PCR, or FISH (fluorescence in situ hybridization.) Then theaverage number of ribosomes within each cell may be calculated.

In some embodiments, the samples used can be environmental samples fromany environmental source, for example, naturally occurring or artificialatmosphere, water systems, soil or any other sample of interest. In someembodiments, the environmental samples may be obtained from, forexample, atmospheric pathogen collection systems, sub-surface sediments,groundwater, ancient water deep within the ground, plant root-soilinterface of grassland, coastal water and sewage treatment plants.Because of the ability of the array system to simultaneously test forsuch a broad range of organisms based on almost all known 16s rRNA genesequences, the array system of the present embodiments can be used inany environment, which also distinguishes it from other array systemswhich generally must be targeted to specific environments.

In other embodiments, the sample used with the array system can be anykind of clinical or medical sample. For example, samples from blood, thelungs or the gut of mammals may be assayed using the array system. Also,the array system of the present embodiments can be used to identify aninfection in the blood of an animal. The array system of the presentembodiments can also be used to assay medical samples that are directlyor indirectly exposed to the outside of the body, such as the lungs,ear, nose, throat, the entirety of the digestive system or the skin ofan animal. Hospitals currently lack the resources to identify thecomplex microbial communities that reside in these areas.

Another advantage of the present embodiments is that simultaneousdetection of a majority of currently known organisms is possible withone sample. This allows for much more efficient study and determinationof particular organisms within a particular sample. Current microarraysdo not have this capability. Also, with the array system of the presentembodiments, simultaneous detection of the top metabolizing organismswithin a sample can be determined without bias from PCR amplification,greatly increasing the efficiency and accuracy of the detection process.

Some embodiments relate to methods of detecting an organism in a sampleusing the described array system. These methods include contacting asample with one organism or a plurality of organisms to the array systemof the present embodiments and detecting the organism or organisms. Insome embodiments, the organism or organisms to be detected are bacteriaor archaea. In some embodiments, the organism or organisms to bedetected are the most metabolically active organism or organisms in thesample.

Some embodiments relate to a method of fabricating an array systemincluding identifying 16s RNA sequences corresponding to a plurality oforganisms of interest, selecting fragments of 16s RNA unique to eachorganism and creating variant RNA fragments corresponding to thefragments of 16s RNA unique to each organism which comprise at least 1nucleotide mismatch and then fabricating the array system.

The following examples are provided for illustrative purposes only, andare in no way intended to limit the scope of the present invention.

EXAMPLE 1

An array system was fabricated using 16s rRNA sequences taken from aplurality of bacterial species. A minimum of 11 different, shortoligonucleotide probes were designed for each taxonomic grouping,allowing one or more probes to not bind, but still give a positivesignal in the assay. Non-specific cross hybridization is an issue whenan abundant 16S rRNA gene shares sufficient sequence similarity tonon-targeted probes, such that a weak but detectable signal is obtained.The use of a perfect match-mismatch (PM-MM) probe pair effectivelyminimized the influence of cross-hybridization. In this technique, thecentral nucleotide is replaced with any of the three non-matching basesso that the increased hybridization intensity signal of the PM over thepaired MM indicates a sequence-specific, positive hybridization. Byrequiring multiple PM-MM probe-pairs to have a positive interaction, thechance that the hybridization signal is due to a predicted targetsequence is substantially increased.

The known 16S rRNA gene sequences larger than 600 bp were grouped intodistinct taxa such that a set of at least 11 probes that were specificto each taxon could be selected. The resulting 8,935 taxa (8,741 ofwhich are represented on the array), each containing approximately 3%sequence divergence, represented all 121 demarcated bacterial andarchacal orders. For a majority of the taxa represented on the array(5,737, 65%), probes were designed from regions of 16S rRNA genesequences that have only been identified within a given taxon. For 1,198taxa (14%) no probe-level sequence could be identified that was notshared with other groups of 16S rRNA gene sequences, although the genesequence as a whole was distinctive. For these taxonomic groupings, aset of at least 11 probes was designed to a combination of regions onthe 16S rRNA gene that taken together as a whole did not exist in anyother taxa. For the remaining 1,806 taxa (21%), a set of probes wereselected to minimize the number of putative cross-reactive taxa.Although more than half of the probes in this group have a hybridizationpotential to one outside sequence, this sequence was typically from aphylogenetically similar taxon. For all three probe set groupings, theadvantage of the hybridization approach is that multiple taxa can beidentified simultaneously by targeting unique regions or combinations ofsequence.

EXAMPLE 2

An array system was fabricated according to the following protocol. 16SrDNA sequences (Escherichia coli base pair positions 47 to 1473) wereobtained from over 30,000 16S rDNA sequences that were at least 600nucleotides in length in the 15 Mar. 2002 release of the 16S rDNAdatabase, “Greengenes.” This region was selected because it is boundedon both ends by universally conserved segments that can be used as PCRpriming sites to amplify bacterial or archaeal genomic material usingonly 2 to 4 primers. Putative chimeric sequences were filtered from thedata set using computer software preventing them from being misconstruedas novel organisms. The filtered sequences are considered to be the setof putative 16S rDNA amplicons. Sequences were clustered to enable eachsequence of a cluster to be complementary to a set of perfectly matching(PM) probes. Putative amplicons were placed in the same cluster as aresult of common 17-mers found in the sequence.

The resulting 8,988 clusters, each containing approximately 3% sequencedivergence, were considered operational taxonomic units (OTUs)representing all 121 demarcated prokaryotic orders. The taxonomic familyof each OTU was assigned according to the placement of its memberorganisms in Bergey's Taxonomic Outline. The taxonomic outline asmaintained by Philip Hugenholtz was consulted for phylogenetic classescontaining uncultured environmental organisms or unclassified familiesbelonging to named higher taxa. The OTUs comprising each family wereclustered into sub-families by transitive sequence identity. Altogether,842 sub-families were found. The taxonomic position of each OTU as wellas the accompanying NCBI accession numbers of the sequences composingeach OTU are recorded and publicly available.

The objective of the probe selection strategy was to obtain an effectiveset of probes capable of correctly categorizing mixed amplicons intotheir proper OTU. For each OTU, a set of 11 or more specific 25-mers(probes) were sought that were prevalent in members of a given OTU butwere dissimilar from sequences outside the given OTU. In the first stepof probe selection for a particular OTU, each of the sequences in theOTU was separated into overlapping 25-mers, the potential targets. Theneach potential target was matched to as many sequences of the OTU aspossible. First, a text pattern was used for a search to match potentialtargets and sequences, however, since partial gene sequences wereincluded in the reference set additional methods were performed.Therefore, the multiple sequence alignment provided by Greengenes wasused to provide a discrete measurement of group size at each potentialprobe site. For example, if an OTU containing seven sequences possesseda probe site where one member was missing data, then the site-specificOTU size was only six.

In ranking the possible targets, those having data for all members ofthat OTU were preferred over those found only in a fraction of the OTUmembers. In the second step, a subset of the prevalent targets wasselected and reverse complimented into probe orientation, avoiding thosecapable of mis-hybridization to an unintended amplicon. Probes presumedto have the capacity to mis-hybridize were those 25-mers that containeda central 17-mer matching sequences in more than one OTU. Thus, probesthat were unique to an OTU solely due to a distinctive base in one ofthe outer four bases were avoided. Also, probes with mis-hybridizationpotential to sequences having a common tree node near the root werefavored over those with a common node near the terminal branch.

Probes complementary to target sequences that were selected forfabrication were termed perfectly matching (PM) probes. As each PM probewas chosen, it was paired with a control 25-mer (mismatching probe, MM),identical in all positions except the thirteenth base. The MM probe didnot contain a central 17-mer complimentary to sequences in any OTU. Theprobe complementing the target (PM) and MM probes constitute a probepair analyzed together.

The chosen oligonucleotides were synthesized by a photolithographicmethod at Affymetrix Inc. (Santa Clara, Calif., USA) directly onto a1.28 cm by 1.28 cm glass surface at an approximate density of 10,000probes per μm². Each unique probe sequence on the array had a copynumber of roughly 3.2×10⁶ (personal communication, Affymetrix). Theentire array of 506,944 features was arranged as a square grid of 712rows and columns. Of these features, 297,851 were oligonucleotide 16SrDNA PM or MM probes, and the remaining were used for image orientation,normalization controls or other unrelated analyses. Each DNA chip hadtwo kinds of controls on it: 1) probes that target amplicons ofprokaryotic metabolic genes spiked into the 16S rDNA amplicon mix indefined quantities just prior to fragmentation and 2) probescomplimentary to a pre-labeled oligonucleotide added into thehybridization mix. The first control collectively tested thefragmentation, biotinylation, hybridization, staining and scanningefficiency. It also allowed the overall fluorescent intensity to benormalized across all the arrays in an experiment. The second controldirectly assayed the hybridization, staining and scanning.

EXAMPLE 3

A study was done on diverse and dynamic bacterial population in urbanaerosols utilizing an array system of certain embodiments. Air sampleswere collected using an air filtration collection system under vacuumlocated within six EPA air quality network sites in both San Antonio andAustin, Tex. Approximately 10 liters of air per minute were collected ina polyethylene terephthalate (Celanex), 1.0 μm filter (HoechstCalanese). Samples were collected daily over a 24 h period. Samplefilters were washed in 10 mL buffer (0.1 M Sodium Phosphate, 10 mM EDTA,pH 7.4, 0.01% Tween-20), and the suspension was stored frozen untilextracted. Samples were collected from 4 May to 29 Aug. 2003.

Sample dates were divided according to a 52-week calendar year startingJan. 1, 2003, with each Monday to Sunday cycle constituting a full week.Samples from four randomly chosen days within each sample week wereextracted. Each date chosen for extraction consisted of 0.6 mL filterwash from each of the six sampling sites for that city (San Antonio orAustin) combined into a “day pool” before extraction. In total, for eachweek, 24 filters were sampled.

The “day pools” were centrifuged at 16,000×g for 25 min and the pelletswere resuspended in 400 μL sodium phosphate buffer (100 mM, pH 8). Theresuspended pellets were transferred into 2 mL silica bead lysis tubescontaining 0.9 g of silica/zirconia lysis bead mix (0.3 g of 0.5 mmzirconia/silica beads and 0.6 g of 0.1 mm zirconia/silica beads). Foreach lysis tube, 300 μL buffered sodium dodecyl sulfate (SDS) (100 mMsodium chloride, 500 mM Tris pH 8, 10% [w/v] SDS), and 300 μLphenol:chloroform:isoamyl alcohol (25:24:1) were added. Lysis tubes wereinverted and flicked three times to mix buffers before bead millhomogenization with a Bio101 Fast Prep 120 machine (Qbiogene, Carlsbad,Calif.) at 6.5 m s⁻¹ for 45 s. Following centrifugation at 16,000×g for5 min, the aqueous supernatant was removed to a new 2 mL tube and keptat −20° C. for 1 hour to overnight. An equal volume of chloroform wasadded to the thawed supernatant prior to vortexing for 5 s andcentrifugation at 16,000×g for 3 min. The supernatant was then combinedwith two volumes of a binding buffer “Solution 3” (UltraClean Soil DNAkit, MoBio Laboratories, Solana Beach, Calif.). Genomic DNA from themixture was isolated on a MoBio spin column, washed with “Solution 4”and eluted in 60 μL of 1× Tris-EDTA according to the manufacturer'sinstructions. The DNA was further purified by passage through aSephacryl S-200 HR spin column (Amersham, Piscataway, N.J., USA) andstored at 4° C. prior to PCR amplification. DNA was quantified using aPicoGreen fluorescence assay according to the manufacturer's recommendedprotocol (Invitrogen, Carlsbad, Calif.).

The 16S rRNA gene was amplified from the DNA extract using universalprimers 27F.1, (5′ AGRGTTTGATCMTGGCTCAG) (SEQ ID NO: 1) and 1492R, (5′GGTTACCTTGTTACGACTT) (SEQ ID NO: 2). Each PCR reaction mix contained 1×Ex Taq buffer (Takara Bio Inc, Japan), 0.8 mM dNTP mixture, 0.02 U/μL ExTaq polymerase, 0.4 mg/mL bovine serum albumin (BSA), and 1.0 μM eachprimer. PCR conditions were 1 cycle of 3 min at 95° C., followed by 35cycles of 30 see at 95° C., 30 sec at 53° C., and 1 min at 72° C., andfinishing with 7 min incubation at 72° C. When the total mass of PCRproduct for a sample week reached 2 μg (by gel quantification), all PCRreactions for that week were pooled and concentrated to a volume lessthan 40 μL with a Micron YM100 spin filter (Millipore, Billerica, Mass.)for microarray analysis.

The pooled PCR product was spiked with known concentrations of synthetic16S rRNA gene fragments and non-16S rRNA gene fragments according toTable S1. This mix was fragmented using DNAse I (0.02 U/μg DNA,Invitrogen, CA) and One-Phor-All buffer (Amersham, N.J.) perAffymetrix's protocol, with incubation at 25° C. for 10 min., followedby enzyme denaturation at 98° C. for 10 min. Biotin labeling wasperformed using an Enzo® BioArray™ Terminal Labeling Kit (Enzo LifeSciences Inc., Farmingdale, N.Y.) per the manufacturer's directions. Thelabeled DNA was then denatured (99° C. for 5 min) and hybridized to theDNA microarray at 48° C. overnight (>16 hr). The microarrays were washedand stained per the Affymetrix protocol.

The array was scanned using a GeneArray Scanner (Affymetrix, SantaClara, Calif., USA). The scan was recorded as a pixel image and analyzedusing standard Affymetrix software (Microarray Analysis Suite, version5.1) that reduced the data to an individual signal value for each probe.Background probes were identified as those producing intensities in thelowest 2% of all intensities. The average intensity of the backgroundprobes was subtracted from the fluorescence intensity of all probes. Thenoise value (N) was the variation in pixel intensity signals observed bythe scanner as it read the array surface. The standard deviation of thepixel intensities within each of the identified background cells wasdivided by the square root of the number of pixels comprising that cell.The average of the resulting quotients was used for N in thecalculations described below.

Probe pairs scored as positive were those that met two criteria: (i) theintensity of fluorescence from the perfectly matched probe (PM) wasgreater than 1.3 times the intensity from the mismatched control (MM),and (ii) the difference in intensity, PM minus MM, was at least 130times greater than the squared noise value (>130 N²). These two criteriawere chosen empirically to provide stringency while maintainingsensitivity to the amplicons known to be present from sequencing resultsof cloning the San Antonio week 29 sample. The positive fraction(PosFrac) was calculated for each probe set as the number of positiveprobe pairs divided by the total number of probe pairs in a probe set. Ataxon was considered present in the sample when over 92% of its assignedprobe pairs for its corresponding probe set were positive(PosFrac>0.92). This was determined based on empirical data from clonelibrary analyses. Hybridization intensity (hereafter referred to asintensity) was calculated in arbitrary units (a.u.) for each probe setas the trimmed average (maximum and minimum values removed beforeaveraging) of the PM minus MM intensity differences across the probepairs in a given probe set. All intensities <1 were shifted to 1 toavoid errors in subsequent logarithmic transformations. When summarizingchip results to the sub-family, the probe set producing the highestintensity was used.

To compare the diversity of bacteria detected with microarrays to aknown standard, one sample week was chosen for cloning and sequencingand for replicate microarray analysis. One large pool of SSU amplicons(96 reactions, 50 μL/reaction) from San Antonio week 29 was made. Onemilliliter of the pooled PCR product was gel purified and 768 cloneswere sequenced at the DOE Joint Genome Institute (Walnut Creek, Calif.)by standard methods. An aliquot of this pooled PCR product was alsohybridized to a microarray (three replicate arrays performed).Sub-families containing a taxon scored as present in all three arrayreplicates were recorded. Individual cloned rRNA genes were sequencedfrom each terminus, assembled using Phred and Phrap (S9, S10, S11), andwere required to pass quality tests of Phred 20 (base call errorprobability<10^(−2.0)) to be included in the comparison.

Sequences that appeared chimeric were removed using Bellerophon (S2)with two requirements; (1) the preference score must be less than 1.3and (2) the divergence ratio must be less than 1.1. The divergence ratiois a new metric implemented to weight the likelihood of a sequence beingchimeric according to the similarity of the parent sequences. The moredistantly related the parent sequences are to each other relative totheir divergence from the chimeric sequence, the greater the likelihoodthat the inferred chimera is real. This metric uses the average sequenceidentity between the two fragments of the candidate and theircorresponding parent sequences as the numerator, and the sequenceidentity between the parent sequences as the denominator. Allcalculations are made using a 300 base pair window on either side of themost likely break point. A divergence ratio of 1.1 was empiricallydetermined to be the threshold for classifying sequences as putativelychimeric.

Similarity of clones to array taxa was calculated with DNADIST (S12)using the DNAML-F84 option assuming a transition:transversion ratio of2.0 and an A, C, G, T 16S rRNA gene base frequency of 0.2537, 0.2317,0.3167, 0.1979, respectively. We calculated these parameters empiricallyfrom all records of the ‘Greengenes’ 16S rRNA multiple sequencealignment over 1,250 nucleotides in length. The Lane mask (S13) was usedto restrict similarity observations to 1,287 conserved columns (lanes)of aligned characters. Cloned sequences from this study were rejectedfrom further analysis when <1,000 characters could be compared to alane-masked reference sequence. Sequences were assigned to a taxonomicnode using a sliding scale of similarity threshold (S14). Phylum, class,order, family, sub-family, or taxon placement was accepted when a clonesurpassed similarity thresholds of 80%, 85%, 90%, 92%, 94%, or 97%,respectively. When similarity to nearest database sequence was <94%, theclone was considered to represent a novel sub-family. A full comparisonbetween clone and array analysis is presented in Table S2.

Primers targeting sequences within particular taxa/sub-families weregenerated by ARB's probe design feature (S15). Melting temperatures wereconstrained from 45° C. to 65° C. with G+C content between 40 and 70%.The probes were chosen to contain 3′ bases non-complementary tosequences outside of the taxon/sub-family. Primers were matched usingPrimer3 (S16) to create primer pairs (Table S3). Sequences weregenerated using the Takara enzyme system as described above with thenecessary adjustments in annealing temperatures. Amplicons were purified(PureLink PCR Purification Kit, Invitrogen) and sequenced directly or,if there were multiple unresolved sequences, cloned using a TOPO pCR2.1cloning kit (Invitrogen, CA) according to the manufacturer'sinstructions. The M13 primer pair was used for clones to generate insertamplicons for sequencing at UC Berkeley's sequencing facility.

To determine whether changes in 16S rRNA gene concentration could bedetected using the array, various quantities of distinct rRNA gene typeswere hybridized to the array in rotating combinations. We choseenvironmental organisms, organisms involved in bioremediation, and apathogen of biodefense relevance. 16S rRNA genes were amplified fromeach of the organisms in Table S4. Then each of these nine distinct 16SrRNA gene standards was tested once in each concentration categoryspanning 5 orders of magnitude (0 molecules, 6×10⁷, 1.44×10⁸, 3.46×10⁸,8.30×10⁸, 1.99×10⁹, 4.78×10⁹, 2.75×10¹⁰, 6.61×10¹⁰, 1.59×10¹¹) withconcentrations of individual 16S rRNA gene types rotating between arrayssuch that each array contained the same total of 16S rRNA genemolecules. This is similar to a Latin Square design, although with a9×11 format matrix.

A taxon (#9389) consisting only of two sequences of Pseudomonasoleovorans that correlated well with environmental variables was chosenfor quantitative PCR confirmation of array observed quantitative shifts.Primers for this taxon were designed using the ARB (S15) probe matchfunction to determine unique priming sites based upon regions detectedby array probes. These regions were then imputed into Primer3 (S16) inorder to choose optimal oligonucleotide primers for PCR. Primer qualitywas further assessed using Beacon Designer v3.0 (Premier BioSoft,Calif.). Primers 9389F2 (CGACTACCTGGACTGACACT) (SEQ ID NO: 3) and 9389R2(CACCGGCAGTCTCCTTAGAG) (SEQ ID NO: 4) were chosen to amplify a 436 bpfragment.

To test the specificity of this primer pair, we used a nested PCRapproach. 16S rRNA genes were amplified using universal primers (27F,1492R) from pooled aerosol genomic DNA extracts from both Austin and SanAntonio, Tex. These products were purified and used as template in PCRreactions using primer set 9389F2-9389R2. Amplicons were then ligated topCR2.1 and transformed into E. coli TOP10 cells as recommended by themanufacturer (Invitrogen, CA). Five clones were chosen at random foreach of the two cities (10 clones total) and inserts were amplifiedusing vector specific primers M13 forward and reverse. Standard Sangersequencing was performed and sequences were tested for homology againstexisting database entries (NCBI GenBank, RDPII and Greengenes).

To assay P. oleovorans 16S rRNA gene copies in genomic DNA extracts, weperformed real-time quantitative PCR (qPCR) using an iCycler iQreal-time detection system (BioRad, Calif.) with the iQ Sybr® GreenSupermix (BioRad, Calif.) kit. Reaction mixtures (final volume, 25 μl)contained 1×iQ Sybr® Green Supermix, 7.5 pmol of each primer, 25 ug BSA,0.5 μl DNA extract and DNase/RNase free water. Following enzymeactivation (95° C., 3 min), up to 50 cycles of 95° C., 30 s; 61° C., 30s; 85° C., 10 s and 72° C., 45 s were performed. The specific dataacquisition step (85° C. for 10 s) was set above the Tm of potentialprimer dimers and below the Tm of the product to minimize anynon-amplicon Sybr Green fluorescence. Copy number of P. oleovorans 16SrRNA gene molecules was quantified by comparing cycle thresholds to astandard curve (in the range of 7.6×10⁰ to 7.6×10⁵ copies μl⁻¹), run inparallel, using cloned P. oleovorans 16S rRNA amplicons generated by PCRusing primers M13 forward and reverse. Regression coefficients for thestandard curves were typically greater than 0.99, and post amplificationmelt curve analyses displayed a single peak at 87.5° C., indicative ofspecific Pseudomonas oleovorans 16S rRNA gene amplification (data notshown).

To account for scanning intensity variation from array to array,internal standards were added to each experiment. The internal standardswere a set of thirteen amplicons generated from yeast and bacterialmetabolic genes and five synthetic 16S rRNA-like genes spiked into eachaerosol amplicon pool prior to fragmentation. The known concentrationsof the amplicons ranged from 4 pM to 605 pM in the final hybridizationmix. The intensities resulting from the fifteen corresponding probe setswere natural log transformed. Adjustment factors for each array werecalculated by fitting the linear model using the least-squares method.An array's adjustment factor was subtracted from each probe set'sln(intensity).

For each day of aerosol sampling, 15 factors including humidity, wind,temperature, precipitation, pressure, particulate matter, and week ofyear were recorded from the U.S. National Climatic Data Center(http://www.ncdc.noaa.gov) or the Texas Natural Resource ConservationCommission (http://www.tceq.state.tx.us). The weekly mean, minimum,maximum, and range of values were calculated for each factor from thecollected data. The changes in ln(intensity) for each taxon consideredpresent in the study was tested for correlation against theenvironmental conditions. The resulting p-values were adjusted using thestep-up False Discovery Rate (FDR) controlling procedure (S18).

Multivariate regression tree analysis (S19, S20) was carried out usingthe package ‘mvpart’ within the ‘R’ statistical programming environment.A Bray-Curtis-based distance matrix was created using the function‘gdist’. The Brady-Curtis measure of dissimilarity is generally regardedas a good measure of ecological distance when dealing with ‘species’abundance as it allows for non-linear responses to environmentalgradients (S19, S21).

Prior to rarefaction analysis a distance matrix (DNAML homology) ofclone sequences was created using an online tool athttp://greengenes.lbl.gov/cgi-bin/nph-distance_matrix.cgi followingalignment of the sequences using the NAST aligner(http://greengenes.lbl.gov/NAST) (S22). DOTUR(S23) was used to generaterarefaction curves, Chaot and ACE richness predictions andrank-abundance curves. Nearest neighbor joining was used with 1000iterations for bootstrapping.

DNA yields in the pooled weekly filter washes ranged from 0.522 ng to154 ng. As only an aliquot of the filter washes was extracted weextrapolate the range of DNA extractable from each daily filter to bebetween 150 ng and 4300 ng assuming 10% extraction efficiency. Usingprevious estimates of bacterial to fungal ratios in aerosols (49%bacterial, 44% fungal clones; S24) this range is equivalent to 1.2×10⁷to 3.5×10⁸ bacterial cells per filter assuming a mean DNA content of abacterial cell of 6 fg (S25).

TABLE S1 Spike in-controls of functional genes and synthetic 16SrRNA-like genes used for internal array normalization. Molecules appliedDescription Affymetrix control spikes AFFX-BioB-5_at 5.83 × 10¹⁰ E. colibiotin synthetase AFFX-BioB-M_at 5.43 × 10¹⁰ E. coli biotin synthetaseAFFX-BioC-5_at 2.26 × 10¹⁰ E. coli bioC protein AFFX-BioC-3_at 1.26 ×10¹⁰ E. coli bioC protein AFFX-BioDn-3_at 1.68 × 10¹⁰ E. colidethiobiotin synthetase AFFX-CreX-5_at 2.17 × 10⁹ Bacteriophage P1 crerecombinase protein AFFX-DapX-5_at 9.03 × 10⁸ B. subtilis dapB,dihydrodipicolinate reductase AFFX-DapX-M_at 3.03 × 10¹⁰ B. subtilisdapB, dihydrodipicolinate reductase YFL039C 5.02 × 10⁸ Saccharomyces,Gene for actin (Act1p) protein YER022W 1.21 × 10⁹ Saccharomyces, RNApolymerase II mediator complex subunit (SRB4p) YER148W 2.91 × 10⁹Saccharomyces, TATA-binding protein, general transcription factor(SPT15) YEL002C 7.00 × 10⁹ Saccharomyces, Beta subunit of theoligosaccharyl transferase (OST) glycoprotein complex (WBP1) YEL024W7.29 × 10¹⁰ Saccharomyces, Ubiquinol-cytochrome- c reductase (RIP1)Synthetic 16S rRNA control spikes SYNM.neurolyt_st 6.74 × 10⁸ Syntheticderivative of Mycoplasma neurolyticum 16S rRNA gene SYNLc.oenos_st 3.90× 10⁹ Synthetic derivative of Leuconostoc oenos 16S rRNA geneSYNCau.cres8_st 9.38 × 10⁹ Synthetic derivative of Caulobactercrescentus 16S rRNA gene SYNFer.nodosm_st 4.05 × 10¹⁰ Syntheticderivative of Fervidobacterium nodosum 16S rRNA gene SYNSap.grandi_st1.62 × 10⁹ Synthetic derivative of Saprospira grandis 16S rRNA gene

TABLE S2 Comparison between clone and array results. Clone detectionDNAML similarity number Array of Comparison detection¹ clones Array 3/3assigned Chimera checking⁴ Array and Cloning replicates to maximummaximum only Cloning only pass = 1, sub- maximum preference divergencepass = 1, pass = 1, pass = 1, Sub-families fail = 0 family² similarity³score⁵ ratio⁶ fail = 0 fail = 0 fail = 0 Bacteria; AD3; Unclassified;Unclassified; 1 1 0 0 Unclassified; sf_1 Bacteria; Acidobacteria;Acidobacteria-10; 1 1 0 0 Unclassified; Unclassified; sf_1 Bacteria;Acidobacteria; Acidobacteria-4; 1 1 0 0 Ellin6075/11-25; Unclassified;sf_1 Bacteria; Acidobacteria; Acidobacteria-6; 1 1 0 0 Unclassified;Unclassified; sf_1 Bacteria; Acidobacteria; Acidobacteria; 1 3 0.9731.16 1.06 0 1 0 Acidobacteriales; Acidobacteriaceae; sf_14 Bacteria;Acidobacteria; Acidobacteria; 1 1 0 0 Acidobacteriales;Acidobacteriaceae; sf_16 Bacteria; Acidobacteria; Solibacteres; 1 20.960 0.00 0.00 0 1 0 Unclassified; Unclassified; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 1 0 0 Acidimicrobiales;Acidimicrobiaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 1 00 Acidimicrobiales; Microthrixineae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 0 1 0.947 0.00 0.00 0 0 1 Acidimicrobiales;Microthrixineae; sf_12 Bacteria; Actinobacteria; Actinobacteria; 1 10.961 1.28 1.06 0 1 0 Acidimicrobiales; Unclassified; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 1 0.947 0.00 0.00 0 1 0Actinomycetales; Acidothermaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 1 0 0 Actinomycetales; Actinomycetaceae; sf_1Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0 Actinomycetales;Actinosynnemataceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 40.998 0.00 0.00 0 1 0 Actinomycetales; Brevibacteriaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 2 0.981 1.20 1.08 0 1 0Actinomycetales; Cellulomonadaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 1 0 0 Actinomycetales; Corynebacteriaceae; sf_1Bacteria; Actinobacteria; Actinobacteria; 1 2 0.999 1.21 1.03 0 1 0Actinomycetales; Dermabacteraceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 1 0 0 Actinomycetales; Dermatophilaceae; sf_1Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0 Actinomycetales;Dietziaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0Actinomycetales; Frankiaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 2 1.000 0.00 0.00 0 1 0 Actinomycetales;Geodermatophilaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 10 0 Actinomycetales; Gordoniaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 10 0.999 1.20 1.18 0 1 0 Actinomycetales;Intrasporangiaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 1 00 Actinomycetales; Kineosporiaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 4 0.999 0.00 0.00 0 1 0 Actinomycetales;Microbacteriaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 20.985 1.26 1.15 0 1 0 Actinomycetales; Micrococcaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 3 1.000 1.27 1.20 0 1 0Actinomycetales; Micromonosporaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 1 0 0 Actinomycetales; Mycobacteriaceae; sf_1Bacteria; Actinobacteria; Actinobacteria; 1 1 0.999 0.00 0.00 0 1 0Actinomycetales; Nocardiaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 4 0.994 1.16 1.07 0 1 0 Actinomycetales;Nocardioidaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 11.000 0.00 0.00 0 1 0 Actinomycetales; Nocardiopsaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 1 0 0 Actinomycetales;Promicromonosporaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 13 0.982 1.20 1.05 0 1 0 Actinomycetales; Propionibacteriaceae; sf_1Bacteria; Actinobacteria; Actinobacteria; 1 3 0.999 1.14 1.11 0 1 0Actinomycetales; Pseudonocardiaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 1 0 0 Actinomycetales; Sporichthyaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 3 0.998 1.30 1.14 0 1 0Actinomycetales; Streptomycetaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 2 0.996 0.00 0.00 0 1 0 Actinomycetales;Streptomycetaceae; sf_3 Bacteria; Actinobacteria; Actinobacteria; 1 1 00 Actinomycetales; Streptosporangiaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 1 0 0 Actinomycetales; Thermomonosporaceae; sf_1Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0 Actinomycetales;Unclassified; sf_3 Bacteria; Actinobacteria; Actinobacteria; 0 1 0.9871.18 1.12 0 0 1 Actinomycetales; Williamsiaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 1 0 0 Bifidobacteriales;Bifidobacteriaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 130.990 1.56 1.05 0 1 0 Rubrobacterales; Rubrobacteraceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 1 0 0 Unclassified; Unclassified; sf_1Bacteria; Aquificae; Aquificae; Aquificales; 1 1 0 0Hydrogenothermaceae; sf_1 Bacteria; BRC1; Unclassified; Unclassified; 11 0 0 Unclassified; sf_2 Bacteria; Bacteroidetes; Bacteroidetes; 1 1 0 0Bacteroidales; Porphyromonadaceae; sf_1 Bacteria; Bacteroidetes;Bacteroidetes; 1 1 0 0 Bacteroidales; Prevotellaceae; sf_1 Bacteria;Bacteroidetes; Bacteroidetes; 1 1 0 0 Bacteroidales; Rikenellaceae; sf_5Bacteria; Bacteroidetes; Bacteroidetes; 1 1 0 0 Bacteroidales;Unclassified; sf_15 Bacteria; Bacteroidetes; Flavobacteria; 1 1 0.9430.00 0.00 0 1 0 Flavobacteriales; Blattabacteriaceae; sf_1 Bacteria;Bacteroidetes; Flavobacteria; 1 1 0 0 Flavobacteriales;Flavobacteriaceae; sf_1 Bacteria; Bacteroidetes; Flavobacteria; 1 1 0 0Flavobacteriales; Unclassified; sf_3 Bacteria; Bacteroidetes; KSA1;Unclassified; 1 1 0 0 Unclassified; sf_1 Bacteria; Bacteroidetes;Sphingobacteria; 1 6 0.973 1.22 1.07 0 1 0 Sphingobacteriales;Crenotrichaceae; sf_11 Bacteria; Bacteroidetes; Sphingobacteria; 1 1 0 0Sphingobacteriales; Flammeovirgaceae; sf_5 Bacteria; Bacteroidetes;Sphingobacteria; 1 1 0 0 Sphingobacteriales; Flexibacteraceae; sf_19Bacteria; Bacteroidetes; Sphingobacteria; 1 1 0 0 Sphingobacteriales;Sphingobacteriaceae; sf_1 Bacteria; Bacteroidetes; Sphingobacteria; 1 10 0 Sphingobacteriales; Unclassified; sf_3 Bacteria; Bacteroidetes;Sphingobacteria; 1 1 0 0 Sphingobacteriales; Unclassified; sf_6Bacteria; Bacteroidetes; Unclassified; 1 1 0 0 Unclassified;Unclassified; sf_4 Bacteria; Caldithrix; Unclassified; Caldithrales; 1 10 0 Caldithraceae; sf_1 Bacteria; Caldithrix; Unclassified;Caldithrales; 1 1 0 0 Caldithraceae; sf_2 Bacteria; Chlamydiae;Chlamydiae; 1 1 0 0 Chlamydiales; Chlamydiaceae; sf_1 Bacteria;Chlorobi; Chlorobia; Chlorobiales; 1 1 0 0 Chlorobiaceae; sf_1 Bacteria;Chlorobi; Unclassified; Unclassified; 1 1 0 0 Unclassified; sf_1Bacteria; Chlorobi; Unclassified; Unclassified; 1 1 0 0 Unclassified;sf_6 Bacteria; Chlorobi; Unclassified; Unclassified; 1 1 0 0Unclassified; sf_9 Bacteria; Chloroflexi; Anaerolineae; 1 1 0.992 0.000.00 0 1 0 Chloroflexi-1a; Unclassified; sf_1 Bacteria; Chloroflexi;Anaerolineae; 1 1 0 0 Chloroflexi-1b; Unclassified; sf_2 Bacteria;Chloroflexi; Anaerolineae; 1 1 0 0 Unclassified; Unclassified; sf_9Bacteria; Chloroflexi; Chloroflexi-3; 1 1 0 0 Roseiflexales;Unclassified; sf_5 Bacteria; Chloroflexi; Dehalococcoidetes; 1 1 0 0Unclassified; Unclassified; sf_1 Bacteria; Chloroflexi; Unclassified; 11 0 0 Unclassified; Unclassified; sf_12 Bacteria; Coprothermobacteria;Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_1 Bacteria;Cyanobacteria; Cyanobacteria; 1 1 0 0 Chloroplasts; Chloroplasts; sf_11Bacteria; Cyanobacteria; Cyanobacteria; 1 3 0.995 0.00 0.00 0 1 0Chloroplasts; Chloroplasts; sf_5 Bacteria; Cyanobacteria; Cyanobacteria;1 1 0 0 Chroococcales; Unclassified; sf_1 Bacteria; Cyanobacteria;Cyanobacteria; 0 1 0.954 1.09 1.12 0 0 1 Chroococcidiopsis;Unclassified; sf_1 Bacteria; Cyanobacteria; Cyanobacteria; 1 1 0 0Leptolyngbya; Unclassified; sf_1 Bacteria; Cyanobacteria; Cyanobacteria;1 1 0 0 Nostocales; Unclassified; sf_1 Bacteria; Cyanobacteria;Cyanobacteria; 1 1 0 0 Oscillatoriales; Unclassified; sf_1 Bacteria;Cyanobacteria; Cyanobacteria; 1 1 0 0 Phormidium; Unclassified; sf_1Bacteria; Cyanobacteria; Cyanobacteria; 1 1 0 0 Plectonema;Unclassified; sf_1 Bacteria; Cyanobacteria; Cyanobacteria; 1 1 0 0Prochlorales; Unclassified; sf_1 Bacteria; Cyanobacteria; Cyanobacteria;1 1 0 0 Pseudanabaena; Unclassified; sf_1 Bacteria; Cyanobacteria;Cyanobacteria; 1 1 0 0 Spirulina; Unclassified; sf_1 Bacteria;Cyanobacteria; Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_5Bacteria; Cyanobacteria; Unclassified; 1 1 0 0 Unclassified;Unclassified; sf_8 Bacteria; Cyanobacteria; Unclassified; 1 1 0 0Unclassified; Unclassified; sf_9 Bacteria; DSS1; Unclassified;Unclassified; 1 1 0 0 Unclassified; sf_2 Bacteria; Deinococcus-Thermus;Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_1 Bacteria;Deinococcus-Thermus; Unclassified; 0 1 0.993 1.19 1.05 0 0 1Unclassified; Unclassified; sf_3 Bacteria; Firmicutes; Bacilli;Bacillales; 1 2 0.963 1.14 1.15 0 1 0 Alicyclobacillaceae; sf_1Bacteria; Firmicutes; Bacilli; Bacillales; 1 151 1.000 1.37 1.23 0 1 0Bacillaceae; sf_1 Bacteria; Firmicutes; Bacilli; Bacillales; 1 6 0.9971.15 1.07 0 1 0 Halobacillaceae; sf_1 Bacteria; Firmicutes; Bacilli;Bacillales; 1 14 0.999 1.19 1.07 0 1 0 Paenibacillaceae; sf_1 Bacteria;Firmicutes; Bacilli; Bacillales; 1 2 0.999 1.12 1.04 0 1 0Sporolactobacillaceae; sf_1 Bacteria; Firmicutes; Bacilli; Bacillales; 16 0.999 1.30 1.06 0 1 0 Staphylococcaceae; sf_1 Bacteria; Firmicutes;Bacilli; Bacillales; 1 6 0.999 1.15 1.09 0 1 0 Thermoactinomycetaceae;sf_1 Bacteria; Firmicutes; Bacilli; Exiguobacterium; 0 1 0.998 0.00 0.000 0 1 Unclassified; sf_1 Bacteria; Firmicutes; Bacilli; Lactobacillales;1 6 0.998 1.23 1.26 0 1 0 Aerococcaceae; sf_1 Bacteria; Firmicutes;Bacilli; Lactobacillales; 1 1 0 0 Carnobacteriaceae; sf_1 Bacteria;Firmicutes; Bacilli; Lactobacillales; 1 3 0.999 1.32 1.08 0 1 0Enterococcaceae; sf_1 Bacteria; Firmicutes; Bacilli; Lactobacillales; 11 0 0 Lactobacillaceae; sf_1 Bacteria; Firmicutes; Bacilli;Lactobacillales; 1 1 0 0 Leuconostocaceae; sf_1 Bacteria; Firmicutes;Bacilli; Lactobacillales; 1 1 0 0 Streptococcaceae; sf_1 Bacteria;Firmicutes; Bacilli; Lactobacillales; 1 1 0 0 Unclassified; sf_1Bacteria; Firmicutes; Catabacter; Unclassified; 1 1 0 0 Unclassified;sf_1 Bacteria; Firmicutes; Catabacter; Unclassified; 1 1 0.954 0.00 0.000 1 0 Unclassified; sf_4 Bacteria; Firmicutes; Clostridia;Clostridiales; 1 14 0.998 1.45 1.15 0 1 0 Clostridiaceae; sf_12Bacteria; Firmicutes; Clostridia; Clostridiales; 1 1 0 0 Eubacteriaceae;sf_1 Bacteria; Firmicutes; Clostridia; Clostridiales;. 1 2 0.990 1.121.12 0 1 0 Lachnospiraceae; sf_5 Bacteria; Firmicutes; Clostridia;Clostridiales; 1 4 0.980 1.12 1.16 0 1 0 Peptococc/Acidaminococc; sf_11Bacteria; Firmicutes; Clostridia; Clostridiales; 1 1 0.976 1.21 1.04 0 10 Peptostreptococcaceae; sf_5 Bacteria; Firmicutes; Clostridia;Clostridiales; 1 1 0 0 Syntrophomonadaceae; sf_5 Bacteria; Firmicutes;Clostridia; Clostridiales; 1 1 0 0 Unclassified; sf_17 Bacteria;Firmicutes; Clostridia; Unclassified; 1 1 0 0 Unclassified; sf_3Bacteria; Firmicutes; Desulfotomaculum; 1 3 0.984 1.14 1.04 0 1 0Unclassified; Unclassified; sf_1 Bacteria; Firmicutes; Mollicutes; 1 1 00 Acholeplasmatales; Acholeplasmataceae; sf_1 Bacteria; Firmicutes;Symbiobacteria; 1 1 0 0 Symbiobacterales; Unclassified; sf_1 Bacteria;Firmicutes; Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_8Bacteria; Firmicutes; gut clone group; 1 1 0 0 Unclassified;Unclassified; sf_1 Bacteria; Gemmatimonadetes; Unclassified; 1 1 0 0Unclassified; Unclassified; sf_5 Bacteria; Natronoanaerobium;Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_1 Bacteria;Nitrospira; Nitrospira; Nitrospirales; 1 1 0 0 Nitrospiraceae; sf_1Bacteria; OD1; OP11-5; Unclassified; 1 1 0 0 Unclassified; sf_1Bacteria; OP8; Unclassified; Unclassified; 1 1 0 0 Unclassified; sf_3Bacteria; Planctomycetes; Planctomycetacia; 1 1 0 0 Planctomycetales;Anammoxales; sf_2 Bacteria; Planctomycetes; Planctomycetacia; 1 1 0 0Planctomycetales; Anammoxales; sf_4 Bacteria; Planctomycetes;Planctomycetacia; 1 1 0 0 Planctomycetales; Pirellulae; sf_3 Bacteria;Planctomycetes; Planctomycetacia; 1 1 0 0 Planctomycetales;Planctomycetaceae; sf_3 Bacteria; Proteobacteria; Alphaproteobacteria; 11 0.943 0.00 0.00 0 1 0 Acetobacterales; Acetobacteraceae; sf_1Bacteria; Proteobacteria; Alphaproteobacteria; 1 6 0.980 1.24 1.17 0 1 0Acetobacterales; Roseococcaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0.947 1.12 1.10 0 1 0 Azospirillales;Azospirillaceae; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 1 10 0 Azospirillales; Magnetospirillaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Azospirillales; Unclassified; sf_1Bacteria; Proteobacteria; Alphaproteobacteria; 1 2 0.951 1.13 1.08 0 1 0Bradyrhizobiales; Beijerinck/Rhodoplan/Methylocyst; sf_3 Bacteria;Proteobacteria; Alphaproteobacteria; 1 1 0 0 Bradyrhizobiales;Bradyrhizobiaceae; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 11 0 0 Bradyrhizobiales; Hyphomicrobiaceae; sf_1 Bacteria;Proteobacteria; Alphaproteobacteria; 1 2 0.999 0.00 0.00 0 1 0Bradyrhizobiales; Methylobacteriaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 4 0.982 1.15 1.11 0 1 0 Bradyrhizobiales;Unclassified; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 1 1 00 Bradyrhizobiales; Xanthobacteraceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0.968 0.00 0.00 0 1 0 Caulobacterales;Caulobacteraceae; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 11 0.951 0.00 0.00 0 1 0 Consistiales; Caedibacteraceae; sf_3 Bacteria;Proteobacteria; Alphaproteobacteria; 1 1 0 0 Consistiales;Caedibacteraceae; sf_4 Bacteria; Proteobacteria; Alphaproteobacteria; 11 0 0 Consistiales; Caedibacteraceae; sf_5 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Consistiales; Unclassified; sf_4 Bacteria;Proteobacteria; Alphaproteobacteria; 1 1 0.976 1.18 1.05 0 1 0 Devosia;Unclassified; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 1 1 00 Ellin314/wr0007; Unclassified; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Ellin329/Riz1046; Unclassified; sf_1Bacteria; Proteobacteria; Alphaproteobacteria; 1 1 0 0 Fulvimarina;Unclassified; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 1 1 00 Rhizobiales; Bartonellaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Rhizobiales;Beijerinck/Rhodoplan/Methylocyst; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Rhizobiales; Bradyrhizobiaceae; sf_1Bacteria; Proteobacteria; Alphaproteobacteria; 1 1 0 0 Rhizobiales;Brucellaceae; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 1 1 00 Rhizobiales; Hyphomicrobiaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Rhizobiales; Phyllobacteriaceae; sf_1Bacteria; Proteobacteria; Alphaproteobacteria; 1 2 0.981 1.27 1.26 0 1 0Rhizobiales; Rhizobiaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Rhizobiales; Unclassified; sf_1 Bacteria;Proteobacteria; Alphaproteobacteria; 1 1 0 0 Rhodobacterales;Hyphomonadaceae; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 1 60.985 1.13 1.11 0 1 0 Rhodobacterales; Rhodobacteraceae; sf_1 Bacteria;Proteobacteria; Alphaproteobacteria; 1 1 0 0 Rickettsiales;Anaplasmataceae; sf_3 Bacteria; Proteobacteria; Alphaproteobacteria; 1 10 0 Rickettsiales; Rickettsiaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Rickettsiales; Unclassified; sf_2 Bacteria;Proteobacteria; Alphaproteobacteria; 1 9 0.994 1.23 1.10 0 1 0Sphingomonadales; Sphingomonadaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 6 0.990 1.13 1.06 0 1 0 Sphingomonadales;Sphingomonadaceae; sf_15 Bacteria; Proteobacteria; Alphaproteobacteria;0 3 0.997 1.20 1.08 0 0 1 Sphingomonadales; Unclassified; sf_1 Bacteria;Proteobacteria; Alphaproteobacteria; 1 1 0.954 0.00 0.00 0 1 0Unclassified; Unclassified; sf_6 Bacteria; Proteobacteria;Betaproteobacteria; 1 3 1.000 1.35 1.07 0 1 0 Burkholderiales;Alcaligenaceae; sf_1 Bacteria; Proteobacteria; Betaproteobacteria; 1 121.000 0.00 0.00 0 1 0 Burkholderiales; Burkholderiaceae; sf_1 Bacteria;Proteobacteria; Betaproteobacteria; 1 1 0 0 Burkholderiales;Comamonadaceae; sf_1 Bacteria; Proteobacteria; Betaproteobacteria; 1 20.996 0.00 0.00 0 1 0 Burkholderiales; Oxalobacteraceae; sf_1 Bacteria;Proteobacteria; Betaproteobacteria; 1 1 0 0 Burkholderiales;Ralstoniaceae; sf_1 Bacteria; Proteobacteria; Betaproteobacteria; 1 1 00 MND1 clone group; Unclassified; sf_1 Bacteria; Proteobacteria;Betaproteobacteria; 1 1 0 0 Methylophilales; Methylophilaceae; sf_1Bacteria; Proteobacteria; Betaproteobacteria; 1 1 0 0 Neisseriales;Unclassified; sf_1 Bacteria; Proteobacteria; Betaproteobacteria; 1 1 0 0Nitrosomonadales; Nitrosomonadaceae; sf_1 Bacteria; Proteobacteria;Betaproteobacteria; 1 1 0 0 Rhodocyclales; Rhodocyclaceae; sf_1Bacteria; Proteobacteria; Betaproteobacteria; 1 1 0 0 Unclassified;Unclassified; sf_3 Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 00 AMD clone group; Unclassified; sf_1 Bacteria; Proteobacteria;Deltaproteobacteria; 1 1 0 0 Bdellovibrionales; Unclassified; sf_1Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 0 0Desulfobacterales; Desulfobulbaceae; sf_1 Bacteria; Proteobacteria;Deltaproteobacteria; 1 1 0 0 Desulfobacterales; Nitrospinaceae; sf_2Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 0 0Desulfobacterales; Unclassified; sf_4 Bacteria; Proteobacteria;Deltaproteobacteria; 1 1 0 0 Desulfovibrionales; Desulfohalobiaceae;sf_1 Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 0 0Desulfovibrionales; Desulfovibrionaceae; sf_1 Bacteria; Proteobacteria;Deltaproteobacteria; 1 1 0 0 Desulfovibrionales; Unclassified; sf_1Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 0 0 EB1021 group;Unclassified; sf_4 Bacteria; Proteobacteria; Deltaproteobacteria; 0 10.974 0.00 0.00 0 0 1 Myxococcales; Myxococcaceae; sf_1 Bacteria;Proteobacteria; Deltaproteobacteria; 1 1 0 0 Myxococcales;Polyangiaceae; sf_3 Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 00 Myxococcales; Unclassified; sf_1 Bacteria; Proteobacteria;Deltaproteobacteria; 1 1 0 0 Syntrophobacterales; Syntrophobacteraceae;sf_1 Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 0 0Unclassified; Unclassified; sf_9 Bacteria; Proteobacteria;Deltaproteobacteria; 1 1 0 0 dechlorinating clone group; Unclassified;sf_1 Bacteria; Proteobacteria; Epsilonproteobacteria; 1 1 0 0Campylobacterales; Campylobacteraceae; sf_3 Bacteria; Proteobacteria;Epsilonproteobacteria; 1 1 0 0 Campylobacterales; Helicobacteraceae;sf_3 Bacteria; Proteobacteria; Epsilonproteobacteria; 1 1 0 0Campylobacterales; Unclassified; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 Aeromonadales; Aeromonadaceae; sf_1Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0 Alteromonadales;Alteromonadaceae; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 11 0 0 Alteromonadales; Pseudoalteromonadaceae; sf_1 Bacteria;Proteobacteria; Gammaproteobacteria; 1 1 0 0 Alteromonadales;Unclassified; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 00 Chromatiales; Chromatiaceae; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 Chromatiales; Ectothiorhodospiraceae; sf_1Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0 Chromatiales;Unclassified; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 00 Ellin307/WD2124; Unclassified; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 3 0.995 1.12 1.04 0 1 0 Enterobacteriales;Enterobacteriaceae; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria;1 1 0 0 Enterobacteriales; Enterobacteriaceae; sf_6 Bacteria;Proteobacteria; Gammaproteobacteria; 1 1 0 0 GAO cluster; Unclassified;sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0Legionellales; Coxiellaceae; sf_3 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 Legionellales; Unclassified; sf_1 Bacteria;Proteobacteria; Gammaproteobacteria; 1 1 0 0 Legionellales;Unclassified; sf_3 Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 00 Methylococcales; Methylococcaceae; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 Oceanospirillales; Alcanivoraceae; sf_1Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0Oceanospirillales; Halomonadaceae; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 Oceanospirillales; Unclassified; sf_3Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0 Pasteurellales;Pasteurellaceae; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 1 20.996 1.16 1.10 0 1 0 Pseudomonadales; Moraxellaceae; sf_3 Bacteria;Proteobacteria; Gammaproteobacteria; 1 2 0.998 1.18 1.03 0 1 0Pseudomonadales; Pseudomonadaceae; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 SUP05; Unclassified; sf_1 Bacteria;Proteobacteria; Gammaproteobacteria; 1 1 0 0 Shewanella; Unclassified;sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0 Symbionts;Unclassified; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 00 Thiotrichales; Francisellaceae; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 Thiotrichales; Piscirickettsiaceae; sf_3Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0 Thiotrichales;Thiotrichaceae; sf_3 Bacteria; Proteobacteria; Gammaproteobacteria; 1 10 0 Unclassified; Unclassified; sf_3 Bacteria; Proteobacteria;Gammaproteobacteria; 1 2 0.997 0.00 0.00 0 1 0 Xanthomonadales;Xanthomonadaceae; sf_3 Bacteria; Proteobacteria; Gammaproteobacteria; 11 0 0 aquatic clone group; Unclassified; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 uranium waste clones; Unclassified; sf_1Bacteria; Proteobacteria; Unclassified; 1 1 0 0 Unclassified;Unclassified; sf_20 Bacteria; Spirochaetes; Spirochaetes; 1 1 0 0Spirochaetales; Leptospiraceae; sf_3 Bacteria; Spirochaetes;Spirochaetes; 1 1 0 0 Spirochaetales; Spirochaetaceae; sf_1 Bacteria;Spirochaetes; Spirochaetes; 1 1 0 0 Spirochaetales; Spirochaetaceae;sf_3 Bacteria; TM7; TM7-3; Unclassified; 1 1 0 0 Unclassified; sf_1Bacteria; TM7; Unclassified; Unclassified; 1 1 0 0 Unclassified; sf_1Bacteria; Verrucomicrobia; Unclassified; 1 1 0 0 Unclassified;Unclassified; sf_4 Bacteria; Verrucomicrobia; Unclassified; 1 1 0 0Unclassified; Unclassified; sf_5 Bacteria; Verrucomicrobia;Verrucomicrobiae; 1 1 0 0 Verrucomicrobiales; Unclassified; sf_3Bacteria; Verrucomicrobia; Verrucomicrobiae; 1 1 0 0 Verrucomicrobiales;Verrucomicrobia subdivision 5; sf_1 Bacteria; Verrucomicrobia;Verrucomicrobiae; 1 1 0 0 Verrucomicrobiales; Verrucomicrobiaceae; sf_6Bacteria; Verrucomicrobia; Verrucomicrobiae; 1 1 0 0 Verrucomicrobiales;Verrucomicrobiaceae; sf_7 Bacteria; WS3; Unclassified; Unclassified; 1 10 0 Unclassified; sf_1 Bacteria; marine group A; mgA-1; Unclassified; 11 0 0 Unclassified; sf_1 Bacteria; marine group A; mgA-2; Unclassified;1 1 0 0 Unclassified; sf_1 Totals 238 67 178 60 7 Array Clone ArrayArray Clone sub- sub- only and only families families sub- clone sub-families sub- families families ¹A sub-family must have at least onetaxon present above the positive probe threshold of 0.92 (92%) in allthree replicates to be considered present. ²For a clone to be assignedto a sub-family its DNAML similarity must be above the 0.94 (94%)threshold defined for sub-families. ³This is the maximum DNAMLsimilarity measured. ⁴Both maximum preference score and maximumdivergence ratio must pass the criteria below for a clone to beconsidered non-chimeric. ⁵Bellerophon preference score, a ratio of 1.3or greater has been empirically shown to demonstrate a chimericmolecule. ⁶Bellerophon divergence ratio. This is a new metric devised toaid chimera detection, a score greater than 1.1 indicates a potentialchimera.

TABLE S3Confirmation of array sub-family detections by taxon-specific PCR and sequencing.Genbank accession number of Closest BLAST homolog SEQ retrievedSub-family GenBank accession number ID Primer Sequences Tm Ta sequence(sf) verified (% identity) NO. (5′ to 3′) ° C. ° C. DQ236248Actinobacteria, Actinokineospora diospyrosa, 5 For-ACCAAGGCTACGACGGGTA60.5 67.0 Actinosynnemataceae, AF114797 (94.3%) 6Rev-ACACACCGCATGTCAAACC 60.4 sf_1 DQ515230 Actinobacteria,Bifidobacterium adolescentis, 7 For-GGGTGGTAATGCCSGATG 60.0 62.0Bifidobacteriaceae, AF275881 (99.6 %) 8 Rev-CCRCCGTTACACCGGGAA 64.0 sf_1DQ236245 Actinobacteria, Actinomycetaceae SR 11, 9For-CAATGGACTCAAGCCTGATG 53.5 53.0 Kineosporiaceae, sf_1 X87617 (97.7%)10 Rev-CTCTAGCCTGCCCGTTTC 53.9 DQ236250 Chloroflexi,penguin droppings clone KD4-96, 11 For-GAGAGGATGATCAGCCAG 54.0 61.7Anaerolineae, sf_9 AY218649 (90%) 12 Rev-TACGGYTACCTTGTTACGACTT 57.0DQ236247 Cyanobacteria, Geitlerinema sp. PCC 7105, 13 For- 62.2 55.0Geitlerinema, sf_1 AB039010 (89.3%) TCCGTAGGTGGCTGTTCAAGTCTG 14 Rev-61.7 GCTTTCGTCCCTCAGTGTCAGTTG DQ236246 Cyanobacteria,Thermosynechococcus elongatus 15 For- 58.7 55.0 Thermosynechococcus,BP-1, TGTCGTGAGATGTTGGGTTAAGTC sf_1 BA000039 (96.0%) 16 Rev- 58.8TGAGCCGTGGTTTAAGAGATTAGC DQ129654 Gammaproteobacteria,Pseudoalteromonas sp. S511-1, 17 For-GCCTCACGCCATAAGATTAG 53.1 50.0Pseudoaltermonadaceae AB029824 (99.1%) 18 Rev- 53.0 sf_1GTGCTTTCTTCTGTAAGTAACG DQ129656 Nitrospira, Nitrospira moscoviensis, 19For-TCGAAAAGCGTGGGG 57.6 47.0 Nitrospiraceae, sf_1 X82558 (98.5%) 20Rev-CTTCCTCCCCCGTTC 54.4 DQ129666 Planctomycetes,Planctomyces brasiltensis, 21 For-GAAACTGCCCAGACAC 50.0 60.0Plantomycetaceae, AJ231190 (94%) 22 Rev-AGTAACGTTCGCACAG 48.0 sf_3DQ515231 Proteobacteria, Uncultured Arcobacter sp. clone 23For-GGATGACACTTTTCGGAG 54.0 48.0 Campylobacteraceae DS017, 24Rev-AATTCCATCTGCCTCTCC 55.0 sf_3 DQ234101 (98%) DQ129662 Spirochaetes,Leptospira borgpetersenii 25 For-GGCGGCGCGTTTTAAGC 57.0 58.7Leptospiracea, sf_3 X17547 (90.9%) 26 Rev-ACTCGGGTGGTGTGACG 57.0DQ129661 Spirochaetes, Spirochaeta asiatica, Spirochaetaceae, sf_1X93926 (90.0%) DQ129660 Spirochaetes, Borrelia hermsiiSpirochaetaceae, sf_3 M72398 (91.0%) DQ236249 TM7, TM7-3 sf_1oral clone EW096, 27 For-AYTGGGCGTAAAGAGTTGC 58.0 66.3 AY349415 (88.8%)28 Rev-TACGGYTACCTTGTTACGACTT 57.0 Tm = Melting temperature; Ta =Optimal annealing temperature used in PCR reaction.

TABLE S4 Bacteria and Archaea used for Latin square hybridizationassays. Organism Phylum/Sub-phylum ATCC Arthrobacter oxydansActinobacteria 14359^(a) Bacillus anthracis AMES Firmicutes —^(b) pX01-pX02- Caulobacter crescentus CB15 Alpha-proteobacteria 19089  Dechloromonas agitata CKB Beta-proteobacteria 700666^(c) Dehalococcoides ethenogenes 195 Chloroflexi —^(d) Desulfovibrio vulgarisDelta-proteobacteria 29579^(e) Hildenborough Francisella tularensisGamma-proteobacteria  6223 Geobacter metallireducens GS-15Delta-proteobacteria 53774^(c) Geothrix fermentans H-5 Acidobacteria700665^(c)  Sulfolobus solfataricus Crenarchaeota 35092   ^(a)Stainobtained from Hoi-Ying Holman, LBNL. ^(b)Strain obtained from ArthurFriedlander USAMRID. ^(c)Strain obtained from John Coates, UC Berkeley.^(d)Strain obtained from Lisa Alvarez-Cohen, UC Berkeley. ^(e)Strainobtained from Terry Hazen, LBNL.

TABLE S5 Correlations between environmental/temporal parameters. Sub-family- mean max min range mean mean max max range level week TEMPMAXTEMP MINTEMP MINTEMP WDSP SLP VISIB PM2.5 PM2.5 richness Austin week1.000 mean TEMP 0.703 1.000 max 0.471 0.665 1.000 MAXTEMP min 0.6850.691 0.073 1.000 MINTEMP range −0.267 −0.149 0.571 −0.777 1.000 MINTEMPmean WDSP −0.540 −0.053 −0.038   −0.195 0.136 1.000 mean SLP 0.607 0.1450.162 0.352 −0.188 −0.380   1.000 max VISIB 0.486 0.311 0.400 0.2300.063 −0.498   0.318 1.000 max PM2.5 −0.529 −0.219 −0.331   −0.162−0.075 0.617 −0.409   −0.817 1.000 range PM2.5 −0.507 −0.219 −0.366−0.117 −0.134 0.613 −0.407   −0.829 0.989 1.000 Sub-family- −0.074−0.104 0.098 −0.460 0.440 0.251 −0.182   −0.066 −0.058 −0.063 1.000level richness San Antonio week 1.000 mean TEMP 0.452 1.000 max 0.1890.553 1.000 MAXTEMP min 0.570 0.622 0.044 1.000 MINTEMP range −0.318−0.116 0.630 −0.749 1.000 MINTEMP mean WDSP −0.523 −0.014 −0.015  −0.014 0.001 1.000 mean SLP 0.722 0.029 −0.088   0.300 −0.291 −0.495  1.000 max VISIB 0.420 0.169 0.298 −0.054 0.240 −0.234   0.501 1.000 maxPM2.5 −0.508 −0.157 −0.197   −0.022 −0.114 0.189 −0.420   −0.830 1.000range PM2.5 −0.515 −0.164 −0.201   0.000 −0.134 0.255 −0.455   −0.8430.991 1.000 Sub-family- 0.125 −0.016 −0.050   0.024 −0.051 −0.419  0.175 −0.054 −0.064 −0.102 1.000 level richness Underlined fontindicates a significant positive correlation, while italic fontindicates a significant negative correlation at a 95% confidenceinterval.

TABLE S6 Sub-families detected in Austin or San Antonio correlatingsignificantly with environmental parameters. All of the below are in theDomain of Bacteria taxon and BH Sub- representative Environ. Correl. padjusted Phylum Class Order Family family organism name factor Coeff.value p. value^(a) Actino- Actinobacteria Actino- Unclassified sf_3 1114clone max TEMP 0.64 4.05E−05 2.49E−02 bacteria mycetales PENDANT-38Actino- Actinobacteria Actino- Unclassified sf_3 1114 clone mean TEMP0.66 2.16E−05 2.01E−02 bacteria mycetales PENDANT-38 Actino-Actinobacteria Actino- Unclassified sf_3 1114 clone week 0.63 6.73E−053.18E−02 bacteria mycetales PENDANT-38 Actino- Actinobacteria Actino-Gordoniaceae sf_1 1116 Gordona terrae week 0.61 1.18E−04 3.68E−02bacteria mycetales Actino- Actinobacteria Actino- Actino- sf_1 1125Actinokineospora max TEMP 0.6 1.53E−04 4.30E−02 bacteria mycetalessynnemataceae diospyrosa str. NRRL B-24047T Actino- ActinobacteriaActino- Actino- sf_1 1125 Actinokineospora week 0.63 7.42E−05 3.38E−02bacteria mycetales synnemataceae diospyrosa str. NRRL B-24047T Actino-Actinobacteria Actino- Strepto- sf_1 1128 Streptomyces sp. week 0.73.75E−06 1.18E−02 bacteria mycetales mycetaceae str. YIM 80305 Actino-Actinobacteria Actino- Sporich- sf_1 1223 Sporichthya mean TEMP 0.611.42E−04 4.21E−02 bacteria mycetales thyaceae polymorpha Actino-Actinobacteria Actino- Sporich- sf_1 1223 Sporichthya min 0.61 1.50E−044.27E−02 bacteria mycetales thyaceae polymorpha MINTEMP Actino-Actinobacteria Actino- Sporich- sf_1 1223 Sporichthya week 0.7 4.39E−061.18E−02 bacteria mycetales thyaceae polymorpha Actino- ActinobacteriaActino- Micro- sf_1 1264 Waste-gas biofilter mean TEMP 0.61 1.47E−044.25E−02 bacteria mycetales bacteriaceae clone BIhi33 Actino-Actinobacteria Actino- Micro- sf_1 1264 Waste-gas biofilter week 0.697.62E−06 1.18E−02 bacteria mycetales bacteriaceae clone BIhi33 Actino-Actinobacteria Actino- Strepto- sf_1 1344 Streptomyces max TEMP 0.645.42E−05 2.84E−02 bacteria mycetales mycetaceae species Actino-Actinobacteria Actino- Strepto- sf_1 1344 Streptomyces mean TEMP 0.629.56E−05 3.63E−02 bacteria mycetales mycetaceae species Actino-Actinobacteria Actino- Thermomon- sf_1 1406 Actinomadura week 0.652.91E−05 2.29E−02 bacteria mycetales osporaceae kijaniata Actino-Actinobacteria Actino- Kineo- sf_1 1424 Actinomycetaceae max VISIB 0.61.70E−04 4.59E−02 bacteria mycetales sporiaceae SR 139 Actino-Actinobacteria Actino- Kineo- sf_1 1424 Actinomycetaceae week 0.628.03E−05 3.50E−02 bacteria mycetales sporiaceae SR 139 Actino-Actinobacteria Actino- Intraspor- sf_1 1445 Ornithinimicrobium week 0.629.46E−05 3.63E−02 bacteria mycetales angiaceae humiphilum str. DSM 12362HKI 124 Actino- Actinobacteria Actino- Unclassified sf_3 1514 unculturedhuman week 0.69 7.08E−06 1.18E−02 bacteria mycetales oral bacterium A11Actino- Actinobacteria Actino- Pseudo- sf_1 1530 Pseudonocardia max TEMP0.64 5.10E−05 2.79E−02 bacteria mycetales nocardiaceae thermophila str.IMSNU 20112T Actino- Actinobacteria Actino- Pseudo- sf_1 1530Pseudonocardia mean TEMP 0.66 1.99E−05 1.97E−02 bacteria mycetalesnocardiaceae thermophila str. IMSNU 20112T Actino- ActinobacteriaActino- Pseudo- sf_1 1530 Pseudonocardia min 0.61 1.10E−04 3.63E−02bacteria mycetales nocardiaceae thermophila str. MINTEMP IMSNU 20112TActino- Actinobacteria Actino- Pseudo- sf_1 1530 Pseudonocardia min TEMP0.6 1.82E−04 4.73E−02 bacteria mycetales nocardiaceae thermophila str.IMSNU 20112T Actino- Actinobacteria Actino- Pseudo- sf_1 1530Pseudonocardia week 0.73 1.15E−06 5.92E−03 bacteria mycetalesnocardiaceae thermophila str. IMSNU 20112T Actino- ActinobacteriaActino- Cellulo- sf_1 1592 Lake Bogoria week 0.61 1.15E−04 3.63E−02bacteria mycetales monaaaceae isolate 69B4 Actino- ActinobacteriaActino- Coryne- sf_1 1642 Corynebacterium max TEMP 0.62 8.87E−053.63E−02 bacteria mycetales bacteriaceae otitidis Actino- ActinobacteriaActino- Coryne- sf_1 1642 Corynebacterium mean TEMP 0.64 4.12E−052.49E−02 bacteria mycetales bacteriaceae otitidis Actino- ActinobacteriaActino- Coryne- sf_1 1642 Corynebacterium min 0.62 1.07E−04 3.63E−02bacteria mycetales bacteriaceae otitidis MINTEMP Actino- ActinobacteriaActino- Coryne- sf_1 1642 Corynebacterium week 0.63 5.53E−05 2.84E−02bacteria mycetales bacteriaceae otitidis Actino- Actinobacteria Actino-Derma- sf_1 1736 Brachybacterium max TEMP 0.63 6.17E−05 3.09E−02bacteria mycetales bacteraceae rhamnosum LMG 19848T Actino-Actinobacteria Actino- Derma- sf_1 1736 Brachybacterium mean TEMP 0.61.91E−04 4.90E−02 bacteria mycetales bacteraceae rhamnosum LMG 19848TActino- Actinobacteria Actino- Derma- sf_1 1736 Brachybacterium week0.64 4.47E−05 2.62E−02 bacteria mycetales bacteraceae rhamnosum LMG19848 T Actino- Actinobacteria Actino- Strepto- sf_3 1743 Streptomycesscabiei week 0.6 1.60E−04 4.38E−02 bacteria mycetales mycetaceae str.DNK-G01 Actino- Actinobacteria Actino- Nocardiaceae sf_1 1746 Nocardiaweek 0.66 2.48E−05 2.21E−02 bacteria mycetales corynebacteroides Actino-Actinobacteria Actino- Unclassified sf_3 1806 French Polynesia: max TEMP0.65 3.37E−05 2.29E−02 bacteria mycetales Tahiti clone 23 Actino-Actinobacteria Actino- Unclassified sf_3 1806 French Polynesia: meanTEMP 0.66 1.97E−05 1.97E−02 bacteria mycetales Tahiti clone 23 Actino-Actinobacteria Actino- Micromon- sf_1 1821 Catellatospora max TEMP 0.611.10E−04 3.63E−02 bacteria mycetales osporaceae subsp. citrea str. IMSNU22008T Actino- Actinobacteria Actino- Micromon- sf_1 1821 Catellatosporamean 0.61 1.22E−04 3.72E−02 bacteria mycetales osporaceae subsp. citreastr. MINTEMP IMSNU 22008T Actino- Actinobacteria Actino- Micromon- sf_11821 Catellatospora mean TEMP 0.67 1.76E−05 1.97E−02 bacteria mycetalesosporaceae subsp. citrea str. IMSNU 22008T Actino- ActinobacteriaActino- Micromon- sf_1 1821 Catellatospora min 0.7 4.92E−06 1.18E−02bacteria mycetales osporaceae subsp. citrea str. MINTEMP IMSNU 22008TActino- Actinobacteria Actino- Micromon- sf_1 1821 Catellatospora minTEMP 0.65 2.68E−05 2.29E−02 bacteria mycetales osporaceae subsp. citreastr. IMSNU 22008T Actino- Actinobacteria Actino- Micromon- sf_1 1821Catellatospora week 0.62 8.24E−05 3.52E−02 bacteria mycetales osporaceaesubsp. citrea str. IMSNU 22008T Actino- Actinobacteria Rubro- Rubro-sf_1 1892 Sturt arid-zone soil min 0.62 8.51E−05 3.56E−02 bacteriabacterales bacteraceae clone 0319-7H2 MINTEMP Actino- ActinobacteriaRubro- Rubro- sf_1 1892 Sturt arid-zone soil week 0.68 9.19E−06 1.18E−02bacteria bacterales bacteraceae clone 0319-7H2 Actino- ActinobacteriaActino- Actino- sf_1 1984 Saccharothrix max TEMP 0.68 8.01E−06 1.18E−02bacteria mycetales synnemataceae tangerinus str. MK27-91F2 Actino-Actinobacteria Actino- Actino- sf_1 1984 Saccharothrix mean TEMP 0.671.64E−05 1.97E−02 bacteria mycetales synnemataceae tangerinus str.MK27-91F2 Actino- Actinobacteria Actino- Actino- sf_1 1984 Saccharothrixweek 0.7 3.54E−06 1.18E−02 bacteria mycetales synnemataceae tangerinusstr. MK27-91F2 Actino- Actinobacteria Actino- Nocardiaceae sf_1 1999Rhodococcus max TEMP 0.61 1.14E−04 3.63E−02 bacteria mycetales fasciansstr. DFA7 Actino- Actinobacteria Actino- Propioni- sf_1 2023Propionibacterium week 0.62 9.19E−05 3.63E−02 bacteria mycetalesbacteriaceae propionicum str. DSM 43307T Actino- Actinobacteria Actino-Streptospor- sf_1 2037 Nonomuraea week 0.61 1.13E−04 3.63E−02 bacteriamycetales angiaceae terrinata str. DSM 44505 Firmicutes BacilliBacillales Thermoactin sf_1 3619 Thermoactinomyces range 0.65 3.41E−052.29E−02 omycetaceae intermedius str. MINTEMP ATCC 33205T Cyano-Cyanobacteria Symploca Unclassified sf_1 5165 Symploca atlantica week0.63 6.84E−05 3.18E−02 bacteria str. PCC 8002 Bacter- SphingobacteriaSphingo- Creno- sf_11 5491 Austria: Lake mean TEMP 0.61 1.15E−043.63E−02 oidetes bacteriales trichaceae Gossenkoellesee clone GKS2-106Bacter- Sphingobacteria Sphingo- Creno- sf_11 5491 Austria: Lake week0.63 6.62E−05 3.18E−02 oidetes bacteriales trichaceae Gossenkoelleseeclone GKS2-106 Bacter- Sphingobacteria Sphingo- Flexi- sf_19 5866Taxeobacter week 0.62 1.08E−04 3.63E−02 oidetes bacteriales bacteraceaeocellatus str. Myx2105 Bacter- Bacteroidetes Bacteroidales Prevo- sf_16047 deep marine week 0.62 9.52E−05 3.63E−02 oidetes tellaceae sedimentclone MB-A2-107 Bacter- Sphingobacteria Sphingo- Creno- sf_11 6171Bifissio spartinae max PM2.5 −0.62 9.95E−05 3.63E−02 oidetes bacterialestrichaceae str. AS1.1762 Bacter- Sphingobacteria Sphingo- Creno- sf_116171 Bifissio spartinae max VISIB 0.62 1.09E−04 3.63E−02 oidetesbacteriales trichaceae str. AS1.1762 Bacter- Sphingobacteria Sphingo-Creno- sf_11 6171 Bifissio spartinae range PM2.5 −0.65 2.86E−05 2.29E−02oidetes bacteriales trichaceae str. AS1.1762 Bacter- SphingobacteriaSphingo- Creno- sf_11 6171 Bifissio spartinae week 0.61 1.25E−043.76E−02 oidetes bacteriales trichaceae str. AS1.1762 Proteo-Alphaproteo- Sphingo- Sphingo- sf_1 6808 PCB-polluted mean TEMP 0.637.73E−05 3.45E−02 bacteria bacteria monadales monadaceae soil cloneWD267 Proteo- Alphaproteo- Sphingo- Sphingo- sf_1 6808 PCB-polluted week0.69 5.54E−06 1.18E−02 bacteria bacteria monadales monadaceae soil cloneWD267 Proteo- Alphaproteo- Sphingo- Sphingo- sf_1 7132 Sphingomonas minSLP 0.64 5.16E−05 2.79E−02 bacteria bacteria monadales monadaceae sp.K101 Proteo- Alphaproteo- Sphingo- Sphingo- sf_1 7132 Sphingomonas week0.75 2.74E−07 2.81E−03 bacteria bacteria monadales monadaceae sp. K101Proteo- Alphaproteo- Bradyrhizo- Unclassified sf_1 7255 Pleomorphomonasmax TEMP 0.65 3.57E−05 2.29E−02 bacteria bacteria biales oryzae str.B-32 Proteo- Alphaproteo- Bradyrhizo- Unclassified sf_1 7255Pleomorphomonas mean TEMP 0.64 4.62E−05 2.63E−02 bacteria bacteriabiales oryzae str. B-32 Proteo- Alphaproteo- Sphingo- Sphingo- sf_1 7344rhizosphere soil week 0.68 8.96E−06 1.18E−02 bacteria bacteria monadalesmonadaceae RSI-21 Proteo- Alphaproteo- Sphingo- Sphingo- sf_1 7411Sphingomonas min SLP 0.66 2.01E−05 1.97E−02 bacteria bacteria monadalesmonadaceae adhaesiva Proteo- Alphaproteo- Sphingo- Sphingo- sf_1 7411Sphingomonas week 0.74 6.42E−07 4.39E−03 bacteria bacteria monadalesmonadaceae adhaesiva Proteo- Alphaproteo- Rhodo- Rhodo- sf_1 7527 cloneCTD56B mean TEMP 0.61 1.44E−04 4.23E−02 bacteria bacteria bacteralesbacteraceae Proteo- Alphaproteo- Sphingo- Sphingo- sf_1 7555 derivedmicrobial week 0.6 1.60E−04 4.38E−02 bacteria bacteria monadalesmonadaceae ‘pearl’-community clone sipK48 Proteo- Alphaproteo-Bradyrhizo- Methylo- sf_1 7593 Methylobacterium max TEMP 0.65 3.53E−052.29E−02 bacteria bacteria biales bacteriaceae organophilum Proteo-Alphaproteo- Bradyrhizo- Methylo- sf_1 7593 Methylobacterium mean TEMP0.62 9.87E−05 3.63E−02 bacteria bacteria biales bacteriaceaeorganophilum Proteo- Alphaproteo- Bradyrhizo- Methylo- sf_1 7593Methylobacterium week 0.68 8.06E−06 1.18E−02 bacteria bacteria bialesbacteriaceae organophilum Proteo- Alphaproteo- Devosia Unclassified sf_17626 Devosia neptuniae week 0.6 1.80E−04 4.73E−02 bacteria bacteria str.J1 Proteo- Betaproteo- Burkhold- Comamon- sf_1 7786 unidentified alphamean TEMP 0.65 3.45E−05 2.29E−02 bacteria bacteria eriales adaceaeproteobacterium Proteo- Betaproteoteria Burkhold- Burkhold- sf_1 7899Burkholderia week 0.65 3.43E−05 2.29E−02 bacteria eriales eriaceaeandropogonis Proteo- Gammaproteo- Unclassified Unclassified sf_3 8759Agricultural soil max TEMP 0.6 1.74E−04 4.63E−02 bacteria bacteriaSC-I-87 Proteo- Gammaproteo- Pseudo- Pseudo- sf_1 9389 Pseudomonas minSLP 0.68 8.57E−06 1.18E−02 bacteria bacteria monadales monadaceaeoleovorans Proteo- Gammaproteo- Pseudo- Pseudo- sf_1 9389 Pseudomonasweek 0.83 1.03E−09 2.11E−05 bacteria bacteria monadales monadaceaeoleovorans ^(a)P-value is adjusted for multiple comparisons using falsediscovery rate controlling procedure (S18).

TABLE S7 Bacterial sub-families detected (92% or greater of probes inprobe set positive) most frequently over 17 week study. Most frequentlydetected 16S rRNA gene sequences AU SA Bacteria; Acidobacteria;Acidobacteria; Acidobacteriales; 17 17 Acidobacteriaceae; sf_14Bacteria; Acidobacteria; Acidobacteria-6; Unclassified; 16 17Unclassified; sf_1 Bacteria; Acidobacteria; Solibacteres; Unclassified;17 17 Unclassified; sf_1 Bacteria; Actinobacteria; Actinobacteria;Actinomycetales; 17 17 Cellulomonadaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; Actinomycetales; 16 17 Corynebacteriaceae; sf_1Bacteria; Actinobacteria; Actinobacteria; Actinomycetales; 17 17Gordoniaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria;Actinomycetales; 17 17 Kineosporiaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; Actinomycetales; 16 17 Microbacteriaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; Actinomycetales; 17 17 Micrococcaceae;sf_1 Bacteria; Actinobacteria; Actinobacteria; Actinomycetales; 17 17Micromonosporaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria;Actinomycetales; 17 17 Mycobacteriaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; Actinomycetales; 17 17 Nocardiaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; Actinomycetales; 17 17Promicromonosporaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria;Actinomycetales; 16 17 Pseudonocardiaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; Actinomycetales; 17 17Streptomycetaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria;Actinomycetales; 16 17 Thermomonosporaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; Actinomycetales; 17 17 Unclassified;sf_3 Bacteria; Actinobacteria; Actinobacteria; Rubrobacterales; 16 17Rubrobacteraceae; sf_1 Bacteria; Actinobacteria; Actinobacteria;Unclassified; 16 17 Unclassified; sf_1 Bacteria; Actinobacteria; BD2-10group; Unclassified; 17 16 Unclassified; sf_2 Bacteria; Bacteroidetes;Sphingobacteria; Sphingobacteriales; 16 17 Unclassified; sf_3 Bacteria;Chloroflexi; Anaerolineae; Chloroflexi-la; 16 17 Unclassified; sf_1Bacteria; Chloroflexi; Anaerolineae; Unclassified; 16 17 Unclassified;sf_9 Bacteria; Chloroflexi; Dehalococcoidetes; Unclassified; 16 17Unclassified; sf_1 Bacteria; Cyanobacteria; Cyanobacteria; Chloroplasts;17 17 Chloroplasts; sf_5 Bacteria; Cyanobacteria; Cyanobacteria;Plectonema; 16 17 Unclassified; sf_1 Bacteria; Cyanobacteria;Unclassified; Unclassified; 16 17 Unclassified; sf_5 Bacteria;Firmicutes; Bacilli; Bacillales; Bacillaceae; sf_1 17 17 Bacteria;Firmicutes; Bacilli; Bacillales; Halobacillaceae; sf_1 17 17 Bacteria;Firmicutes; Bacilli; Bacillales; Paenibacillaceae; sf_1 16 17 Bacteria;Firmicutes; Bacilli; Lactobacillales; 17 17 Enterococcaceae; sf_1Bacteria; Firmicutes; Bacilli; Lactobacillales; 16 17 Streptococcaceae;sf_1 Bacteria; Firmicutes; Catabacter; Unclassified; Unclassified; sf_116 17 Bacteria; Firmicutes; Clostridia; Clostridiales; 17 17Clostridiaceae; sf_12 Bacteria; Firmicutes; Clostridia; Clostridiales;17 17 Lachnospiraceae; sf_5 Bacteria; Firmicutes; Clostridia;Clostridiales; 17 17 Peptococc/Acidaminococc; sf_11 Bacteria;Firmicutes; Clostridia; Clostridiales; 17 17 Peptostreptococcaceae; sf_5Bacteria; Firmicutes; Clostridia; Clostridiales; Unclassified; sf_17 1617 Bacteria; Firmicutes; Unclassified; Unclassified; 16 17 Unclassified;sf_8 Bacteria; Nitrospira; Nitrospira; Nitrospirales; 17 16Nitrospiraceae; sf_1 Bacteria; OP3; Unclassified; Unclassified;Unclassified; sf_4 16 17 Bacteria; Proteobacteria; Alphaproteobacteria;Acetobacterales; 17 16 Acetobacteraceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; Azospirillales; 16 17 Unclassified; sf_1 Bacteria;Proteobacteria; Alphaproteobacteria; 17 17 Bradyrhizobiales;Beijerinck/Rhodoplan/Methylocyst; sf_3 Bacteria; Proteobacteria;Alphaproteobacteria; Bradyrhizobiales; 17 17 Bradyrhizobiaceae; sf_1Bacteria; Proteobacteria; Alphaproteobacteria; Bradyrhizobiales; 17 17Hyphomicrobiaceae; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria;Bradyrhizobiales; 16 17 Methylobacteriaceae; sf_1 Bacteria;Proteobacteria; Alphaproteobacteria; Ellin314/wr0007; 16 17Unclassified; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria;Rhizobiales; 16 17 Bradyrhizobiaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; Rhizobiales; 17 17 Phyllobacteriaceae; sf_1Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales; 16 17Unclassified; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria;Rhodobacterales; 17 17 Rhodobacteraceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; Rickettsiales; 17 17 Unclassified; sf_1 Bacteria;Proteobacteria; Alphaproteobacteria; 17 17 Sphingomonadales;Sphingomonadaceae; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria;17 17 Sphingomonadales; Sphingomonadaceae; sf_15 Bacteria;Proteobacteria; Alphaproteobacteria; Unclassified; 17 17 Unclassified;sf_6 Bacteria; Proteobacteria; Betaproteobacteria; Burkholderiales; 1617 Alcaligenaceae; sf_1 Bacteria; Proteobacteria; Betaproteobacteria;Burkholderiales; 16 17 Burkholderiaceae; sf_1 Bacteria; Proteobacteria;Betaproteobacteria; Burkholderiales; 16 17 Comamonadaceae; sf_1Bacteria; Proteobacteria; Betaproteobacteria; Burkholderiales; 17 17Oxalobacteraceae; sf_1 Bacteria; Proteobacteria; Betaproteobacteria;Burkholderiales; 16 17 Ralstoniaceae; sf_1 Bacteria; Proteobacteria;Betaproteobacteria; Methylophilales; 16 17 Methylophilaceae; sf_1Bacteria; Proteobacteria; Betaproteobacteria; Rhodocyclales; 16 17Rhodocyclaceae; sf_1 Bacteria; Proteobacteria; Betaproteobacteria;Unclassified; 17 17 Unclassified; sf_3 Bacteria; Proteobacteria;Deltaproteobacteria; 16 17 Syntrophobacterales; Syntrophobacteraceae;sf_1 Bacteria; Proteobacteria; Epsilonproteobacteria; 17 17Campylobacterales; Campylobacteraceae; sf_3 Bacteria; Proteobacteria;Epsilonproteobacteria; 17 17 Campylobacterales; Helicobacteraceae; sf_3Bacteria; Proteobacteria; Epsilonproteobacteria; 17 17Campylobacterales; Unclassified; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 16 17 Alteromonadales; Alteromonadaceae; sf_1Bacteria; Proteobacteria; Gammaproteobacteria; Chromatiales; 16 17Chromatiaceae; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 16 17Enterobacteriales; Enterobacteriaceae; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 17 17 Enterobacteriales; Enterobacteriaceae; sf_6Bacteria; Proteobacteria; Gammaproteobacteria; Legionellales; 17 17Unclassified; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria;Legionellales; 16 17 Unclassified; sf_3 Bacteria; Proteobacteria;Gammaproteobacteria; 16 17 Pseudomonadales; Moraxellaceae; sf_3Bacteria; Proteobacteria; Gammaproteobacteria; 16 17 Pseudomonadales;Pseudomonadaceae; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria;Unclassified; 17 17 Unclassified; sf_3 Bacteria; Proteobacteria;Gammaproteobacteria; 17 17 Xanthomonadales; Xanthomonadaceae; sf_3Bacteria; TM7; TM7-3; Unclassified; Unclassified; sf_1 16 17 Bacteria;Unclassified; Unclassified; Unclassified; 16 17 Unclassified; sf_148Bacteria; Unclassified; Unclassified; Unclassified; 17 17 Unclassified;sf_160 Bacteria; Verrucomicrobia; Verrucomicrobiae; 17 17Verrucomicrobiales; Verrucomicrobiaceae; sf_7 Number of sub-familiesdetected in all samples over 17 week 43 80 period Italic text indicatessub-families not found in all 17 weeks. AU = Austin, SA = San Antonio.

TABLE S8 Bacterial sub-families containing pathogens of public healthand bioterrorism significance and their relatives that were detected inaerosols over the 17 week monitoring period. Austin San Antonio Weeks %of Weeks % of Pathogens and relatives taxon # detected weeks detectedweeks Bacillus anthracis Bacillus cohnii, B. psychrosaccharolyticus,3439 17 100.0 17 100.0 B. benzoevorans Bacillus megaterium 3550 11 64.712 70.6 Bacillus horikoshii 3904 9 52.9 14 82.4 Bacillus litoralis, B.macroides, B. 3337 5 29.4 8 47.1 psychrosaccharolyticus Staphylococcussaprophyticus, S. xylosus, S. 3659 7 41.2 15 88.2 cohnii Bacillusanthracis, cereus, thuringiensis, 3262 0 0.0 1 5.9 mycoides + othersRickettsia prowazekii - rickettsii Rickettsia australis, R.eschlimannii, R. typhi, 7556 2 11.8 5 29.4 R. tarasevichiae + othersRickettsia prowazekii 7114 0 0.0 0 0.0 Rickettsia rickettsii, R.japonica, R. honei + 6809 4 23.5 10 58.8 others Burkholderia mallei -pseudomallei Burkholderia pseudomallei, B. thailandensis 7870 10 58.8 1482.4 Burkholderia mallei 7747 10 58.8 8 47.1 Burkholderia pseudomallei,Burkholderia 8097 13 76.5 15 88.2 cepacia, B. tropica, B. gladioli, B.stabilis, B. plantarii + others Clostridum botulinum - perfringensClostridium butyricum, C. baratii, C. 4598 3 17.6 10 58.8 sardiniense +others Clostridium botulinum type C 4587 2 11.8 4 23.5 Clostridiumperfringens 4576 1 5.9 1 5.9 Clostridium botulinum type G 4575 3 17.6 741.2 Clostridium botulinum types B and E 4353 0 0.0 0 0.0 Francisellatularensis Tilapia parasite 9554 1 5.9 2 11.8 Francisella tularensis9180 0 0.0 0 0.0

TABLE S9 Distribution of array taxa among Bacterial and Archaeal phyla.Numbers of taxa in phylum Phyla represented on array ArchaeaCrenarchaeota 79 Euryarchaeota 224 Korarchaeota 3 YNPFFA 1 Archaeal taxasubtotal 307 Bacteria 1959 group 1 Acidobacteria 98 Actinobacteria 810AD3 1 Aquificae 19 Bacteroidetes 880 BRC1 3 Caldithrix 2 Chlamydiae 27Chlorobi 21 Chloroflexi 117 Chrysiogenetes 1 Coprothermobacteria 3Cyanobacteria 202 Deferribacteres 5 Deinococcus-Thermus 18 Dictyoglomi 5DSS1 2 EM3 2 Fibrobacteres 4 Firmicutes 2012 Fusobacteria 29Gemmatimonadetes 15 LD1PA group 1 Lentisphaerae 8 marine group A 5Natronoanaerobium 7 NC10 4 Nitrospira 29 NKB19 2 OD1 4 OD2 6 OP1 5 OP1012 OP11 20 OP3 5 OP5 3 OP8 8 OP9/JS1 12 OS-K 2 OS-L 1 Planctomycetes 182Proteobacteria 3170 SPAM 2 Spirochaetes 150 SR1 4 Synergistes 19 Termitegroup 1 6 Thermodesulfobacteria 4 Thermotogae 15 TM6 5 TM7 45Unclassified 329 Verrucomicrobia 78 WS1 2 WS3 7 WS5 1 WS6 4 Bacterialtaxa subtotal 8434 Total taxa 8741

EQUIVALENTS

The foregoing written specification is considered to be sufficient toenable one skilled in the art to practice the present embodiments. Theforegoing description and Examples detail certain preferred embodimentsand describes the best mode contemplated by the inventors. It will beappreciated, however, that no matter how detailed the foregoing mayappear in text, the present embodiments may be practiced in many waysand the present embodiments should be construed in accordance with theappended claims and any equivalents thereof.

The term “comprising” is intended herein to be open-ended, including notonly the recited elements, but further encompassing any additionalelements.

What is claimed is:
 1. A method comprising: applying a sample comprisingnucleic acids derived from a plurality of microorganisms representing aplurality of operational taxonomic units (OTUs) to a system, the systemcomprising (i) a first probe set comprising a plurality of differentfirst nucleic acid probes, each of which is complementary to a 16S rRNAor rDNA sequence that is present only within a first OTU, (ii) a secondprobe set comprising a plurality of different second nucleic acidprobes, each of which is complementary to a 16S rRNA or rDNA sequencethat is present in more than one OTU but collectively are present onlyin a second OTU, wherein the second OTU is different than the first OTU,and (iii) a third probe set consisting of a plurality of different thirdnucleic acid probes for detecting a third OTU, wherein the third nucleicacid probes are complementary to 16S rRNA or rDNA sequences collectivelypresent in the third OTU and another OTU, and are selected to minimizethe number of putative cross-reactive OTUs, wherein the third OTU isdifferent from the second OTU and the first OTU; and detectinghybridization between the nucleic acids derived from the plurality ofmicroorganisms and the first, second and third probe sets, andidentifying the presence and/or the quantity of microorganisms from thefirst, second and third OTUs based on the hybridization.
 2. The methodof claim 1, wherein the plurality of microorganisms comprises bacteriaor archaea.
 3. The method of claim 1, wherein the system furthercomprises more than one mismatch probe for each of the nucleic acidprobes complementary to 16S rRNA or rDNA sequences, wherein eachmismatch probe differs from the nucleic acid probe to which itcorresponds at one or more nucleotide bases.
 4. The method of claim 1,wherein the OTUs identified in presence and/or quantity are the mostmetabolically active organisms in the sample.
 5. A method comprising:applying a sample comprising nucleic acids derived from a plurality ofmicroorganisms representing a plurality of operational taxonomic units(OTUs) to a system, the system comprising: (i) a first probe setcomprising at least 11 different first nucleic acid probes, each of saidfirst nucleic acid probes being complementary to a 16S rRNA or rDNAsequence present only in a first OTU, (ii) a second probe set comprisingat least 11 different second nucleic acid probes, each of said secondnucleic acid probes being complementary to a 16S rRNA or rDNA sequencepresent in a plurality of OTUs but collectively are present only in asecond OTU, wherein the second OTU is different than the first OTU,(iii) a third probe set consisting of at least 11 different thirdnucleic acid probes for detecting a third OTU, wherein the third nucleicacid probes are complementary to 16S rRNA or rDNA sequences collectivelypresent in the third OTU and another OTU, wherein the third OTU isdifferent from the second OTU and the first OTU, wherein the differentthird nucleic acid probes are selected to minimize the number ofputative cross-reactive OTUs; and detecting hybridization between thenucleic acids derived from the plurality of microorganisms and thefirst, second and third probe sets, and identifying the presence and/orthe quantity of microorganisms from the first, second and third OTUsbased on the hybridization.
 6. The method of claim 5, wherein theidentification is made with a level of confidence of about 90% orhigher.
 7. The method of claim 6, wherein said level of confidence is95% or higher.
 8. The method of claim 6, wherein said level ofconfidence is 98% or higher.
 9. The method of claim 1 or 5, wherein saidsample is an environmental sample.
 10. The method of claim 1 or 5,wherein said sample is a clinical sample.
 11. The method of claim 10,wherein said clinical sample comprises at least one of tissue, skin,bodily fluid, or blood.
 12. The method of claim 10, wherein saidclinical sample is a lung sample, a gut sample, an ear sample, a nosesample, a throat sample, or a digestive system sample.
 13. The method ofclaim 1 or 5, wherein said plurality of nucleic acid probes are arrangedin an array.
 14. The method of claim 1 or 5, wherein said system iscapable of detecting about 9000 different OTUs.
 15. The method of claim1 or 5, wherein said nucleic acids are selected from the groupconsisting of: DNA, RNA, DNA from amplified products, and rRNA.
 16. Themethod of claim 1 or 5, wherein all demarcated bacterial and archaealorders are represented by said nucleic acid probes.
 17. The method ofclaim 1 or 5, wherein, for a majority of OTUs that can be detected bysaid system, said nucleic acid probes are complementary to 16S rRNAsequences that have only been identified within a single OTU.
 18. Themethod of claim 1 or 5, further comprising quantifying rRNA moleculespresent in said sample.
 19. The method of claim 1 or 5, wherein at leasta subset of said first, second or third nucleic acid probes comprisesequences complementary to 16S rRNA or rDNA sequence fragments, whereinonly partial gene sequence is known.
 20. The method of claim 1 whereinsaid system comprises from about 2 to about 200 first, second or thirdnucleic acid probes complementary to 16S rRNA or rDNA sequences for eachof said OTUs.
 21. The method of claim 12, wherein said clinical sampleis a gut sample.
 22. The method of claim 1 or 5, wherein each probe isfrom about 20 to about 30 by long.
 23. The method of claim 1 or 5,wherein each of the probes is a 25-mer.
 24. The method of claim 1 or 5,wherein said system further comprises a mismatch probe for each of theplurality of probes in the first set of probes, second set of probes andthird set of probes, wherein each of the mismatch probes differs fromits respective probe at one or more nucleotide bases based.
 25. Themethod of claim 1 or 5, wherein an OTU is considered present in thesample when at least a threshold percentage of a plurality of probepairs assigned to it are positive.
 26. The method of claim 25, whereinthe threshold percentage is >92%.
 27. The method of claim 1 or 5,wherein an OTU consists of sequences having up to 3% sequencedivergence.
 28. A method comprising: applying a sample comprisingnucleic acids derived from a plurality of microorganisms representing aplurality of operational taxonomic units (OTUs) to a system, the systemcomprising (i) a first probe set comprising a plurality of differentfirst nucleic acid probes, each of which is complementary to a 16S rRNAor rDNA sequence that is present only within a first OTU, (ii) a secondprobe set comprising a plurality of different second nucleic acidprobes, each of which is complementary to a 16S rRNA or rDNA sequencethat is present in more than one OTU but collectively are present onlyin a second OTU, wherein the second OTU is different than the first OTU,and (iii) a third probe set consisting of a plurality of different thirdnucleic acid probes for detecting a third OTU, wherein the third nucleicacid probes are complementary to 16S rRNA or rDNA sequences collectivelypresent in the second OTU and the third OTU, and are selected tominimize the number of putative cross-reactive OTUs, wherein the thirdOTU is different from the second OTU and the first OTU; and detectinghybridization between the nucleic acids derived from the plurality ofmicroorganisms and the first, second and third probe sets, andidentifying the presence and/or the quantity of microorganisms from thefirst, second and third OTUs based on the hybridization.